glmer()
rstanarm
This tutorial uses rstanarm version 2.21.1 and requires the following R packages.
knitr::opts_chunk$set(echo = TRUE, comment = "", cache = T)
# Required Packages
# install.packages("skimr") # Or install Skimr package
library(bookdown)
library(skimr)
library(mlmRev)
library(lme4)
library(lmtest)
library(rstanarm)
library(ggplot2)
library(dplyr)
library(kableExtra)
We will be analyzing the guImmun dataset (Pebley et al., 1996) from the mlmRev package in R. The data include 2159 children (1-4 years of age) who had received some immunization. These 2159 children came from 1595 families living in 161 communities, with an average of 1.4 children per family and 13.4 families per community. The guImmun dataset consists of the following 13 variables:
# Use example dataset from mlmRev package: Immunization in Guatemala
data(guImmun, package = "mlmRev")
str(guImmun)
'data.frame': 2159 obs. of 13 variables:
$ kid : Factor w/ 2159 levels "2","269","272",..: 1 2 3 4 5 6 7 8 9 10 ...
$ mom : Factor w/ 1595 levels "2","185","186",..: 1 2 3 4 5 5 6 7 7 8 ...
$ comm : Factor w/ 161 levels "1","36","38",..: 1 2 2 2 2 2 2 2 2 2 ...
$ immun : Factor w/ 2 levels "N","Y": 2 1 1 1 1 2 2 2 2 2 ...
$ kid2p : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 1 2 2 ...
$ mom25p : Factor w/ 2 levels "N","Y": 1 1 1 1 2 1 1 2 2 2 ...
$ ord : Factor w/ 4 levels "01","23","46",..: 1 2 1 2 3 2 2 4 3 3 ...
$ ethn : Factor w/ 3 levels "L","N","S": 1 1 1 1 1 1 1 1 1 1 ...
$ momEd : Factor w/ 3 levels "N","P","S": 3 2 2 2 2 2 3 2 2 2 ...
$ husEd : Factor w/ 4 levels "N","P","S","U": 3 2 3 2 4 4 2 2 2 3 ...
$ momWork: Factor w/ 2 levels "N","Y": 1 2 2 2 2 2 2 2 2 2 ...
$ rural : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ pcInd81: num 0.1075 0.0437 0.0437 0.0437 0.0437 ...
In this example, we are going to model the probability that these 2159 children had received a full set of immunizations immun as a function of individual characteristics such as kid2p, mom25p, and ord, family characteristics such as ethn, momEd, husEd, and momWork, and community characteristics such as rural and pcInd81.
# Count unique communities, families, and kids
K <- length(unique(guImmun$comm))
J <- length(unique(guImmun$mom))
I <- length(unique(guImmun$kid))
kids | moms | communities |
---|---|---|
2159 | 1595 | 161 |
The content of this tutorial is as follows.
In section 2 we introduce Model 1, a random-intercept logistic regression with level-1 (child-level) predictors.
We use the glmer()
function from the lme4
package to estimate the model parameters by maximum likelihood (ML) using adaptive quadrature. Then we use the ranef()
function from the lme4
package to assign values to the random intercepts for individual families by employing empirical Bayes (EB). Specifically, the predictions are the modes of the conditional posterior densities of the random intercepts, conditional on the parameters being equal to the MLEs.
In section 3, we employ a fully Bayesian (FB) approach to obtain estimates (posterior means and medians) and associated measures of uncertainty for all parameters, including the random intercepts, for Model 1. We compare ML estimates of model parameters and EB predictions of random intercepts with their FB counterparts.
In section 4, we extend Model 1 to include not just level-1 covariates but also level-2 covariates, as well as cross-level interactions (Model 2), We fit Model 2 using both the rstanarm package and the lme4 package. In addition, we show how to access a library of plotting functions for estimates from the rstanarm package.
In section 5, we fit a three-level random-intercept model (Model 3) with level-1, level-2, and level-3 predictors using both packages.
In section 6, we fit our last model (Model 4), which is a three-level random-coefficient model, using both packages.
To address potential confusion, we clarify how to use the terms appropriately for the different approaches.
Remark 1: Estimation vs. Prediction. Although both frequentists and Bayesians use the term “estimation” when they estimate model parameters, frequentists do not say “estimation” when assigning values to random effects because they do not view random effects as model parameters. They instead use the term “prediction”.
Remark 2: Whether model parameters include random intercepts. Bayesians treat \(\zeta_j\) as random parameters, whereas frequentists treat \(\zeta_j\) as random variables, since frenquentists do not accept “random” parameters. Therefore, frequentists call them latent variables or random effects. For Bayesians, both regression coefficients \(\beta\)s and random intercepts \(\zeta_j\) are parameters, but an important distinction between them is that the intercepts \(\zeta_{j}\) vary between clusters and are therefore sometimes referred to as varying intercepts.
glmer()
In this section, we start with a two-level random-intercept model with level-1 predictors (Model 1).
In the first model, we include a family-specific random intercept \(\zeta_j\) in the linear predictor to obtain a random-intercept logistic regression model. The random effects represent unobserved heterogeneity between families (level-2) and induce dependence between children within families (level-1). The model can be written as: \[\textrm{logit}\{Pr(immun_{ij}=1|\boldsymbol{x}_{ij}, \zeta_j)\} = \beta_1 + \beta_2 kid2p_{ij} + \beta_3 mom25p_{ij} +\beta_4 ord23_{ij} +\beta_5 ord46_{ij} +\beta_6 ord7p_{ij}+ \zeta_j \] where the random intercepts \(\zeta_j|\boldsymbol{x}_{ij} \sim N(0, \psi)\) are assumed to be independent of each other, identically distributed across families \(j\), and independent of the covariates \(\boldsymbol{x}_{ij}\).
This is an example of a generalized linear mixed model (GLMM) because it is a generalized linear model with mixed (fixed and random) effects, i.e., fixed effects \(\beta_1\) to \(\beta_6\) and a random effect \(\zeta_j\).
Using the latent-response formulation, the model can equivalently be written as \[ y^*_{ij} = \beta_1 + \beta_2 kid2p_{ij} + \beta_3 mom25p_{ij} +\beta_4 ord23_{ij} +\beta_5 ord46_{ij} +\beta_6 ord7p_{ij}+ \zeta_j +\epsilon_{ij}\] where \(\zeta_j \sim N(0, \psi)\) and the \(\epsilon_{ij}\) follow standard logistic distributions. The binary responses \(y_{ij}\) are determined by the latent continuous responses via the threshold model \[ y_{ij} = \left\{ \begin{array}\\ 1 & \mbox{if } \ y^*_{ij} > 0 \\ 0 & \mbox{otherwise} \\ \end{array} \right. \]
In both formulations of the model (via a logit link or in terms of a latent response), it is assumed that the random intercepts are independent across families and independent of the covariates of children \(i\). It is also assumed that the covariates of other children do not affect the response probability of a given child, conditional on the child’s own covariates and the random intercepts (known as the strict exogeneity conditional on the random intercepts). For the latent response formulation, the \(\epsilon_{ij}\) are assumed to be independent across both children and families and independent of both \(\zeta_j\) and \(x_{ij}\).
Below we use the glmer()
function to estimate a random-intercept logistic regression model with kid2p (Yes or No), mom25p (Yes or No), the three ord dummy variables (with level ‘01’ as the reference group) as child-level categorical covariates, and a random intercept for mom, a factor identifying the family.
The syntax for glmer()
is similar to that of lmer()
for linear models. That is, the glmer()
function with family = gaussian(link = "identity")
is equivalent to lmer()
. Our random-intercept logistic regression models can be fitted using glmer()
by specifying a logit link and a binomial distribution with the family = binomial("logit")
option. The parameters are estimated by approximate maximum likelihood, based on an adaptive Gaussian Hermite quadrature approximation of the likelihood. We specify the number of integration points with the nAGQ=n
option. By choosing 30 integration points, we obtain an more accurate approximation of the log-likelihood than the default Laplace approximation, which is equivalent to nAGQ=1
. The default Laplace approximation is one of the fastest computational methods, but its accuracy is low. We can solve the problem by increasing the number of quadrature points (i.e., the adaptive Gaussian Hermite quadrature approximation). From the next R chunk, we can see the difference in point estimates of model parameters between the model estimated using the adaptive Gaussian Hermite quadrature approximation and the model estimated using the Laplace approximation is large. However, for models with more than one random effect, the lme4
package supports only the Laplace approximation, which may perform poorly.
To solve this limitation, the GLMMadaptive
package can be employed to fit models with multiple random effects (e.g., random intercepts, random slopes, and random quadratic slopes). However, this package can only fit generalized linear mixed models for a single grouping factor. In other words, although this package implements MLE with adaptive Gaussian quadrature, it can only do so for two-level models. In this tutorial, we fit a three-level, random-coefficient logistic regression model in section 6 using the rstanarm
package, which solves this problem.
# Adaptive Gauss-Hermite Quadrature, nAGQ = 30
M1 <- glmer(immun ~ kid2p + mom25p + ord + (1 | mom),
family = binomial("logit"),
data = guImmun, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 30)
# Laplace approximation
M1_La <- glmer(immun ~ kid2p + mom25p + ord + (1 | mom),
family = binomial("logit"),
data = guImmun, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 1)
# Comparing parameter estimates between the model estimated by using adaptive Gaussian Hermite quadrature approximation and the model estimated by Laplacian approximation of the likelihood.
summary(M1_La, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + (1 | mom)
Data: guImmun
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
2855.4 2895.2 -1420.7 2841.4 2152
Scaled residuals:
Min 1Q Median 3Q Max
-1.3462 -0.7192 -0.3725 0.7657 2.0377
Random effects:
Groups Name Variance Std.Dev.
mom (Intercept) 1.597 1.264
Number of obs: 2159, groups: mom, 1595
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.11563 0.18853 -5.917 3.27e-09 ***
kid2pY 1.21701 0.15426 7.889 3.03e-15 ***
mom25pY -0.05152 0.15967 -0.323 0.747
ord23 -0.16469 0.16986 -0.970 0.332
ord46 -0.10636 0.20440 -0.520 0.603
ord7p -0.11084 0.25284 -0.438 0.661
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(M1, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 30) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + (1 | mom)
Data: guImmun
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
2777.3 2817.0 -1381.6 2763.3 2152
Scaled residuals:
Min 1Q Median 3Q Max
-1.7427 -0.4876 -0.2043 0.5166 2.5810
Random effects:
Groups Name Variance Std.Dev.
mom (Intercept) 6.836 2.615
Number of obs: 2159, groups: mom, 1595
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4937 0.2708 -5.517 3.45e-08 ***
kid2pY 1.6845 0.2168 7.770 7.86e-15 ***
mom25pY -0.1298 0.2329 -0.557 0.577
ord23 -0.3140 0.2354 -1.334 0.182
ord46 -0.1798 0.2935 -0.612 0.540
ord7p -0.1017 0.3664 -0.278 0.781
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The summary()
function displays the estimated fixed effects and associated standard errors. We apply this function with models M1_La
and M1
to compare estimates and associated standard errors from the Laplace approximation and the adaptive Gaussian Hermite quadrature approximation. Comparisons show that the fixed effect estimates from M1
are closer to one (with smaller standard errors) than M1_La
.
We then use the print()
function, which gives a more concise summary of the parameters than the summary()
function.
print(M1, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 30) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + (1 | mom)
Data: guImmun
AIC BIC logLik deviance df.resid
2777.284 2817.026 -1381.642 2763.284 2152
Random effects:
Groups Name Std.Dev.
mom (Intercept) 2.615
Number of obs: 2159, groups: mom, 1595
Fixed Effects:
(Intercept) kid2pY mom25pY ord23 ord46 ord7p
-1.4937 1.6845 -0.1298 -0.3140 -0.1798 -0.1017
From the output:
M1
output, we obtain the estimated regression coefficients in the usual format. To interpret the point estimates of the fixed part, we can say that the log odds of receiving the full set of immunizations for first-born children who were less than 2 years old with a mother who was less than 25 years old is estimated as \(\hat{\beta_1}=-1.494\). \(\hat{\beta_2}=1.68\) indicates that the estimated conditional log odds for a given family are higher if the children were 2 years of age or older, after controlling for all other covariates. Instead of considering changes in log odds, we can obtain the exponentiated regression coefficients to make the results more interpretable:# display odds ratios
exp(fixef(M1))
(Intercept) kid2pY mom25pY ord23 ord46 ord7p
0.2245292 5.3900043 0.8782654 0.7305449 0.8354437 0.9033166
In this section, we want to assign values to the random intercepts \(\zeta_j\) for individual families \(j\). By using the ranef()
function, we can obtain the empirical Bayes modal (MAP) predictions of the random intercepts for families. We name the MAP predictions EB_PMode
and the standard errors MAP_SE
in the following R chunk. The MAP predictions are based on the conditional posterior distribution of the random intercepts, conditional on all model parameters being equal to their ML estimates, \(P(\zeta_j|y_{1j}, ..., y_{n_{j}j}, \boldsymbol{X}_{j};\boldsymbol{\hat\beta}^{ML}_j,\hat\psi^{ML})\). The standard errors MAP_SE
are the standard deviations of a normal density whose logarithm has the same curvature at the mode as the conditional posterior distribution.
# display random intercepts assigned for the corresponding moms
head(ranef(M1)$mom)
knitr::opts_chunk$set(message=FALSE,
warning=FALSE,
comment=NA,
cache = T)
# display the MAP predictions
EB_PMode <- ranef(M1, condVar = TRUE)
# display the standard errors
MAP_SE <- sqrt(attr(EB_PMode[[1]], "postVar")[1, , ])
head(MAP_SE)
[1] 1.830832 1.821254 1.779558 1.821254 1.244001 1.788154
It is also useful to compare the conditional modes visually, which we do with the caterpillar plot below. The blue dots are the conditional modes with 95% confidence intervals (using \(\pm 1.96\) standard error). We display the random effects in order from the smallest to the largest. As there are 1959 families, we suppress their IDs by using the scales = list(y = list(alternating = 0))
argument. Therefore, the y-axis of mom’s ID is not displayed in the caterpillar plot.
# display the caterpillar plot
lattice::dotplot(ranef(M1, which = "mom", condVar = TRUE), scales = list(y = list(alternating = 0)))
$mom
rstanarm
In the previous section, we fitted the random-intercept logistic regression by using glmer()
, which estimates the regression coefficients and standard deviation of random effects by approximate maximum likelihood. We specified 30 integration points so that the adaptive quadrature approximation is sufficiently accurate. However, when our models have more than a single random effect, glmer()
could work poorly as it only supports the Laplace approximation. Joe (2008) showed that these Laplace estimates work especially poorly for models with binary responses and small cluster sizes, which is the case here. We also found large discrepancies between estimates based on Laplace and adaptive quadrature for Model 1.
A fully Bayesian (FB) approach overcomes these problems and produces estimates for all model parameters, including the random intercepts together with their associated uncertainties.
In this section, we show how to fit and evaluate Model 1 by using the rstanarm
package. To prevent the results from being subjective (overly influenced by the priors), the default priors in rstanarm
are designed to be weakly informative (see Prior Distributions for rstanarm Models for more details). Specifying weakly informative priors can be used for regularizing parameter estimates, avoiding for instance separation in logistic regression (Gelman et al, 2008).
rstanarm
packageThe rstanarm
package is a wrapper for the rstan
package that enables the most commonly used (multilevel) regression models to be specified and fit in rstan
using customary R modeling syntax. Educational researchers can use the Bayesian approach for multilevel models with only minimal changes to their existing glmer()
syntax. We use the chains=6
argument to set the number of Markov chains to 6 and the iter=4000
argument to set the number of iterations for each chain to 4,000. The first half of these iteration, named warmup, will be discarded. Since we start each chain with some arbitrary starting values for the parameters, we need a long warm-up phase to reach the stationary distribution. By using the print()
function, we can display a quick summary of the FB estimates of the model parameters, excluding the random intercepts.
M1_stanglmer <- stan_glmer(immun ~ kid2p + mom25p + ord + (1 | mom),
family = binomial("logit"),
data = guImmun,
iter = 4000,
chains = 6,
seed = 349)
# display the median and the median absolute deviation (MAD) of the posterior draws
print(M1_stanglmer, digits = 2)
stan_glmer
family: binomial [logit]
formula: immun ~ kid2p + mom25p + ord + (1 | mom)
observations: 2159
------
Median MAD_SD
(Intercept) -1.51 0.27
kid2pY 1.71 0.22
mom25pY -0.14 0.23
ord23 -0.33 0.24
ord46 -0.19 0.30
ord7p -0.10 0.36
Error terms:
Groups Name Std.Dev.
mom (Intercept) 2.7
Num. levels: mom 1595
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
The print()
function displays the median and the median absolute deviation (MAD) of the posterior draws. One great advantage of Bayesian estimation is that we can choose any summaries of the posterior distribution. For example, we can use the summary()
function to obtain the mean and standard deviation of the marginal posterior distribution for each parameter as shown below. In addition, users can specify which parameters to display by using the pars
argument in the summary()
function. Moreover, 95% credible intervals, which have a 95% probability of containing the true value of the parameter given the data, can be obtained by considering the \(2.5^{th}\) to \(97.5^{th}\) percentiles of the posterior draws. In addition, by using the probs
argument, we can specify which quantiles of the posterior distribution to display.
# display the mean and standard deviation of the marginal posterior distribution of each parameter
summary(M1_stanglmer,
pars = c("(Intercept)", "kid2pY", "mom25pY", "ord46", "ord7p", "b[(Intercept) mom:2]", "b[(Intercept) mom:185]", "b[(Intercept) mom:186]"),
probs = c(0.025, 0.975),
digits = 2)
Model Info:
function: stan_glmer
family: binomial [logit]
formula: immun ~ kid2p + mom25p + ord + (1 | mom)
algorithm: sampling
sample: 12000 (posterior sample size)
priors: see help('prior_summary')
observations: 2159
groups: mom (1595)
Estimates:
mean sd 2.5% 97.5%
(Intercept) -1.51 0.27 -2.06 -0.99
kid2pY 1.71 0.22 1.30 2.16
mom25pY -0.14 0.23 -0.60 0.32
ord46 -0.19 0.30 -0.79 0.40
ord7p -0.10 0.37 -0.83 0.62
b[(Intercept) mom:2] 1.74 2.04 -1.93 6.14
b[(Intercept) mom:185] -1.73 2.04 -6.07 1.90
b[(Intercept) mom:186] -1.87 1.96 -6.05 1.72
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.00 1.00 5836
kid2pY 0.00 1.00 4984
mom25pY 0.00 1.00 5485
ord46 0.00 1.00 4475
ord7p 0.01 1.00 4925
b[(Intercept) mom:2] 0.02 1.00 13612
b[(Intercept) mom:185] 0.02 1.00 13807
b[(Intercept) mom:186] 0.02 1.00 13810
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
MCMC diagnostics
Under MCMC diagnostics
from the output, values under mcse
represent estimates of the Monte Carlo standard errors (MCSEs) which represent the uncertainty due to estimating posterior expectations by sample means of Monte Carlo draws from the posterior. That is, the MCSEs represent the standard deviation of estimates obtained for a given dataset by repeatedly using the same MCMC approach to estimate the parameters (but with different starting values and random number sequences). Additionally, the MCMC diagnostics include Rhat
and the effective sample size, n_eff
.
Rhat: potential scale reduction statistic
Rhat
measures the ratio of the average variance of the MCMC iterations within each chain to the variance of the pooled iterations across all chains (after discarding the warmup iterations). If all chains are at equilibrium (have reached the same stationary distribution), these variances will be the same so Rhat
will equal one. The desired value of Rhat is smaller than 1.1. After fitting a Bayesian model, we can also use the bayesplot
package to visualize MCMC diagnostics (see General MCMC diagnostics and vignette for more details). In the next R chunk, we use a generic rhat
function to extract \(\hat{R}\) for estimates for all model parameters and random effects from M1_stanglmer
and display the first 10 extracted \(\hat{R}\) values. Then, We visualize the \(\hat{R}\) values for all model parameters and random effects by using the mcmc_rhat
function:
library("bayesplot")
rhats <- rhat(M1_stanglmer)
structure(rhats[1:10], my_attribute = "This is a vector")
(Intercept) kid2pY mom25pY
1.0002910 1.0003843 1.0007301
ord23 ord46 ord7p
1.0007943 1.0006602 1.0006004
b[(Intercept) mom:2] b[(Intercept) mom:185] b[(Intercept) mom:186]
1.0000814 0.9999651 1.0000840
b[(Intercept) mom:187]
1.0000155
attr(,"my_attribute")
[1] "This is a vector"
color_scheme_set("brightblue") # see help("color_scheme_set")
mcmc_rhat(rhats)
In the plot, the points representing the \(\hat{R}\) values are colored based on their values by cutoff points 1.1 and 1.05.
Another advantage of rstanarm
is that, in addition to allowing a range of plotting options with the bayesplot
package, it allows the use of the shinystan
package to obtain interactive diagnostics and posterior analysis for the Bayesian model (see shinystan). ShinyStan
provides interactive plots and tables that are helpful for analyzing a posterior sample and can be used to identify potential problems by assessing the performance of the MCMC algorithm or the specification of the model.
Effective sample size
n_eff
is an estimate of the effective number of independent draws from the posterior distribution for the parameter of interest. Because the draws within a chain are not independent if there is autocorrelation, the effective sample size, \(n_{eff}\), will be smaller than the total sample size based on the number of iterations, \(N\) (i.e., \(N=6,000 = 6\textrm{ chains} \times (2,000\textrm{ iterations} -1,000 \textrm{ iterations for warmup})\)in this case).
Therefore, the larger the ratio of \(n_{eff}\) to N the better. In the next R chunk, we use the neff_ratio
function to extract the \(n_{eff}/ N\) values by fitting the the M1_stanglmer
model. Then, we visualize the ratios by using the mcmc_neff
function. Since the axis y-axis text is off by default for the above plot because the M1_stanglmer
model has too many parameters. To make the labels clear, we can use the yaxis_text
function to display names of the parameters with the concerning ratios, \(n_{eff}/ N\). The next R chunk displays a subset of parameters focusing on regression coefficients \(\beta\)s and three of the random intercepts \(\zeta_j\).
ratios <- neff_ratio(M1_stanglmer )
ratios_spe <- neff_ratio(M1_stanglmer, pars = c("(Intercept)", "kid2pY", "mom25pY", "ord46", "ord7p", "b[(Intercept) mom:2]", "b[(Intercept) mom:185]", "b[(Intercept) mom:186]"))
mcmc_neff(ratios_spe) + yaxis_text(hjust = 1)
In the above plot, the points representing the ratios, \(n_{eff}/N\), are colored based on cutoff points 0.1 and 0.5.
stan_glmer
and glmer
In this section, we compare the point estimates from the glmer()
and stan_glmer()
functions that we name ML (maximum likelihood) and FB (fully Bayesian), respectively.
Comparison
For the regression coefficients, the FB estimate (median of posterior draws) of the population intercept \(\hat{\beta_1}\) is -1.51, which is close to the ML estimate (-1.49). The FB estimates of the coefficients \(\hat{\beta_2}\) to \(\hat{\beta_6}\) for the covariates \(kid2p_{ij}\), \(mom25p_{ij}\), \(ord23_{ij}\), \(ord46_{ij}\), \(ord7p_{ij}\) are 1.71, -0.13, -0.32, -0.19, -0.11 respectively, which are close to the corresponding ML estimates (1.68, 0.13, -0.31, -0.18, -0.10).
The FB estimate \(\sqrt{\hat{\psi}}\) of the random-intercept standard deviation is 2.72, which is larger than the ML estimate (2.61).
In section 2.1.3, we assigned values to the random intercepts by using the posterior modes (MAP) of the conditional posterior distributions, conditional on the parameters being equal to their ML estimates. Corresponding standard errors were based on the curvature of the conditional log-posterior. In this section, by using the summary()
function to display the FB estimates, we are able to get the means of the draws of parameters and random effects from their joint posterior distribution. Moreover, users can directly access the posterior draws (as shown in section 3.2.3). Next, we compare the marginal (with respect to the other parameters) mean of those posterior draws of the random intercepts with the predictions generated by ranef()
.
# Obtain family-level varying intercept zeta_j
## draws for 1595 family-level errors zeta_j
u_sims <- as.matrix(M1_stanglmer,
regex_pars = "b\\[\\(Intercept\\) mom\\:")
# Compute mean, SD, median, and 95% credible interval of varying intercepts
## Posterior mean and SD of each alpha
FB_PMean <- apply(X = u_sims, # posterior mean
MARGIN = 2,
FUN = mean)
FB_Psd <- apply(X = u_sims, # posterior SD
MARGIN = 2,
FUN = sd)
## Posterior median and 95% credible interval
P_quant <- apply(X = u_sims,
MARGIN = 2,
FUN = quantile,
probs = c(0.025, 0.50, 0.975))
FB_Pquant <- data.frame(t(P_quant))
names(FB_Pquant) <- c("Q2.5", "Q50", "Q97.5")
## Combine summary statistics of posterior simulation draws
FB<- data.frame(FB_PMean, FB_Psd , FB_Pquant)
round(head(FB), 2)
The FB table above shows the means of the posterior draws of the random intercepts for the first four families in the FB_PMean
column, the standard deviations in the FB_Psd
column, and the medians and 95% credible intervals in the Q2.5
, Q50
, Q97.5
columns, respectively.
# Obtain empirical Bayes modal prediction extracted from the ranef function in the section 2.1.2 (sometimes called modal a posterior (MAP) prediction)
MAP <- EB_PMode[1]
# Obtain the standard errors for the random-effect prediction
MAP_se <- sqrt(attr(EB_PMode[[1]], "postVar")[1, , ])
immun_N<-guImmun %>%
filter(immun == "N",
) %>%
group_by(mom) %>%
count(immun == "N", name = "num_N", .drop = FALSE)
immun_Y<-guImmun %>%
filter(immun == "Y",
) %>%
group_by(mom) %>%
count(immun == "Y", name = "num_Y", .drop = FALSE)
immun_diff<-immun_Y[3]-immun_N[3]
a_df <- data.frame(immun_Y[3], immun_N[3], immun_diff, FB_PMean, FB_Psd, MAP, MAP_se)
options(knitr.table.format = "html")
knitr::kable(head(a_df), col.names = c('num_Y', ' num_N ',' Y-N ', ' FB_PMean ', ' FB_PSD ', ' MAP ', ' MAP_se '), caption = "1. Comparison table",linesep = " ", align = "c", digit=2)%>%
kable_styling(latex_options="scale_down") %>%
column_spec(1,width = "3in")
num_Y | num_N | Y-N | FB_PMean | FB_PSD | MAP | MAP_se | |
---|---|---|---|---|---|---|---|
b[(Intercept) mom:2] | 1 | 0 | 1 | 1.74 | 2.04 | 1.28 | 1.83 |
b[(Intercept) mom:185] | 0 | 1 | -1 | -1.73 | 2.04 | -1.31 | 1.82 |
b[(Intercept) mom:186] | 0 | 1 | -1 | -1.87 | 1.96 | -1.48 | 1.78 |
b[(Intercept) mom:187] | 0 | 1 | -1 | -1.72 | 2.01 | -1.31 | 1.82 |
b[(Intercept) mom:188] | 1 | 1 | 0 | 0.08 | 1.44 | 0.09 | 1.24 |
b[(Intercept) mom:189] | 1 | 0 | 1 | 1.88 | 2.00 | 1.44 | 1.79 |
In Table 1, the FB_PMean
and FB_Psd
columns show the FB posterior mean and standard deviation for each family and the MAP
and MAP_se
columns display the conditional (empirical Bayesian) posterior modes and corresponding standard errors. We will discuss these in more detail and visualize the differences in section 3.2.4.
Next, we make a similar table for some extreme families in which all responses are “Y” or “N”.
knitr::kable(head(a_df[order(a_df$num_Y, -a_df$num_N),]), col.names = c('num_Y', ' num_N ',' Y-N ', ' FB_PMean ', ' FB_PSD ', ' MAP ', ' MAP_se '), caption = "2. Comparison table of families with extreme number of No", digit=2)%>%
kable_styling(latex_options="scale_down") %>%
column_spec(1,width = "3in")
num_Y | num_N | Y-N | FB_PMean | FB_PSD | MAP | MAP_se | |
---|---|---|---|---|---|---|---|
b[(Intercept) mom:730] | 0 | 3 | -3 | -2.50 | 1.78 | -1.94 | 1.59 |
b[(Intercept) mom:1079] | 0 | 3 | -3 | -2.51 | 1.77 | -1.94 | 1.59 |
b[(Intercept) mom:1485] | 0 | 3 | -3 | -2.52 | 1.75 | -1.94 | 1.59 |
b[(Intercept) mom:1652] | 0 | 3 | -3 | -2.72 | 1.76 | -2.12 | 1.54 |
b[(Intercept) mom:2176] | 0 | 3 | -3 | -2.52 | 1.80 | -1.94 | 1.59 |
b[(Intercept) mom:2282] | 0 | 3 | -3 | -2.44 | 1.78 | -1.89 | 1.60 |
knitr::kable(head(a_df[order(-a_df$num_Y, a_df$num_N),]), col.names = c('num_Y', ' num_N ',' Y-N ', ' FB_PMean ', ' FB_PSD ', ' MAP ', ' MAP_se '), caption = "3. Comparison table of families with extreme number of Yes", digit=2)%>%
kable_styling(latex_options="scale_down") %>%
column_spec(1,width = "3in")
num_Y | num_N | Y-N | FB_PMean | FB_PSD | MAP | MAP_se | |
---|---|---|---|---|---|---|---|
b[(Intercept) mom:384] | 3 | 0 | 3 | 2.87 | 1.68 | 2.29 | 1.50 |
b[(Intercept) mom:456] | 3 | 0 | 3 | 3.27 | 1.71 | 2.74 | 1.46 |
b[(Intercept) mom:699] | 3 | 0 | 3 | 3.30 | 1.64 | 2.79 | 1.46 |
b[(Intercept) mom:827] | 3 | 0 | 3 | 3.24 | 1.66 | 2.72 | 1.46 |
b[(Intercept) mom:829] | 3 | 0 | 3 | 3.23 | 1.71 | 2.71 | 1.47 |
b[(Intercept) mom:879] | 3 | 0 | 3 | 3.23 | 1.69 | 2.74 | 1.47 |
Tables 2 and 3 show that the FB posterior means (FB_PMean
) are more different from zero than the conditional posterior mode (MAP
) predictions. As will be shown in section 3.2.4, the difference between FB posterior means and conditional posterior mean (EAP
) predictions is in the same direction but much less pronounced, suggesting that the differences are mostly due to using different summaries (mean versus mode) of skewed posterior distributions.
In addition, in these extreme families, the FB posterior standard deviations (FB_Psd
) are larger than the corresponding standard errors for the conditional MAP predictions. One reason for these discrepancies is that the FB approach takes the parameter uncertainty for the other parameters into account by using the marginal posteriors (marginal over the other parameters), whereas the conditional MAP standard errors are based on the conditional posteriors (conditional on the other parameters being equal to the ML estimates).
For comparison, the following table displays some less extreme families in which the numbers of “Y” and “N” responses are almost the same.
knitr::kable(head(a_df[(immun_diff==0),]), col.names = c('num_Y', ' num_N ',' Y-N ', ' FB_PMean ', ' FB_PSD ', ' MAP ', ' MAP_se '), caption = "4. Comparison table of families with same number of Yes and No responses ", digit=2) %>%
kable_styling(latex_options="scale_down") %>%
column_spec(1,width = "3in")
num_Y | num_N | Y-N | FB_PMean | FB_PSD | MAP | MAP_se | |
---|---|---|---|---|---|---|---|
b[(Intercept) mom:188] | 1 | 1 | 0 | 0.08 | 1.44 | 0.09 | 1.24 |
b[(Intercept) mom:192] | 1 | 1 | 0 | 0.58 | 1.49 | 0.59 | 1.36 |
b[(Intercept) mom:250] | 1 | 1 | 0 | 0.03 | 1.41 | 0.04 | 1.24 |
b[(Intercept) mom:254] | 1 | 1 | 0 | 0.68 | 1.50 | 0.71 | 1.33 |
b[(Intercept) mom:268] | 1 | 1 | 0 | 0.08 | 1.45 | 0.10 | 1.24 |
b[(Intercept) mom:304] | 1 | 1 | 0 | 0.09 | 1.41 | 0.09 | 1.24 |
As shown in table 4, when the numbers of “Y” and “N” responses become closer, the discrepancies of predictions and standard errors between the two approaches become smaller. That could be, in part, because the posteriors are more symmetric.
So far, we have shown that the conditional MAP predictions of the random intercepts for families are different from FB estimates. To summarize, for conditional MAP, we obtain the conditional posterior distributions of \(\zeta_j\) by plugging in the ML estimates of the regression coefficients and the random-intercept variance obtained from glmer()
. Then we use the modes of the conditional posterior distributions, conditional on the parameters being equal to their ML estimates. In contrast, in FB we obtain draws from the joint posterior distributions of the random intercepts and all other parameters. The means and standard deviations of the draws of the random intercepts therefore represent marginal posterior means and standard deviations, marginal over all other parameters.
In this section, we use the package bayesplot
, which provides a library of plotting functions for Stan estimates, to visualize the posterior draws. The plots created by bayesplot
are ggplot
objects and can be customized using various functions from the ggplot2
package. Furthermore, baysplot
offers visual MCMC diagnostics, graphical posterior predictive checking, and more. Since we are not digging more into these visualizations in this tutorial, see bayesplot for more information.
library("bayesplot")
library("ggplot2")
library("rstanarm")
library(rstan)
#posterior <- as.array(M1_stanglmer)
#posterior<- as.matrix(M1_stanglmer)
posterior<- as.data.frame(M1_stanglmer)
dim(posterior)
[1] 12000 1602
In the next R chunk, we use the mcmc_dens()
function to display kernel density plots of MCMC draws of a parameter (combining all chains). By using the pars
option, we specify the parameter and display the distribution of the posterior draws for a specific family, mom with id #2, as an example in this tutorial.
color_scheme_set("purple")
p2<-mcmc_dens(posterior, pars = c("b[(Intercept) mom:2]"))
p2
We now compare the FB marginal posterior density of the random intercept for the family whose identifier in variable mom
was \(2\), aka mom 2 (as an example) with the corresponding conditional posterior density, with ML estimates plugged in for the regression coefficients and the random-intercept variances.
First, for our FB estimates, we use the as.data.frame()
function to retrieve the posterior draws of the random effects from M1_stanglmer
. This method enables us to access the draws directly and thus makes it easier to manipulate the plot. For instance, we plot the posterior distribution of the random intercept for a specific family, mom 2. We add a black-solid line to show the mean of posterior draws for mom 2. The 95% credible interval is represented by a black error bar which, unlike frequentist confidence intervals, can be interpreted as having a 95% probability of containing the random intercept.
posterior<- as.data.frame(M1_stanglmer) # retrieve the posterior draws
#print(dimnames(posterior))
FB_posterior_mom2<-posterior$`b[(Intercept) mom:2]`
df <- data.frame(col1 = (1:12000),
col2=FB_posterior_mom2)
p <- ggplot(df)
p + geom_density(aes(FB_posterior_mom2), fill = "lightgray") +
geom_vline(aes(xintercept = mean(FB_posterior_mom2)), linetype = "solid", lwd = 1.5)+
geom_errorbarh(aes(xmin=mean(FB_posterior_mom2)-2*sd(FB_posterior_mom2), xmax =mean(FB_posterior_mom2)+2*sd(FB_posterior_mom2), y=0.02, height = .005), lwd = 1.5)+
annotate(geom="text", x=3, y=0.03, label="FB_mean",color="black")
Secondly, we derive the conditional posterior density of the random intercept for mom 2 with ML estimates plugged in for the regression coefficients and random-intercept variance. This conditional posterior density can be written as \[\textrm{Posterior}(\zeta_2|immun_{22}=1, kid2p_{22}, mom25p_{22}, ord23_{22}, ord46_{22}, ord7p_{22}; \boldsymbol{\hat\beta}^{ML}_j,\hat\psi^{ML})=\\ \frac{\textrm{Prior}(\zeta_2)\times \textrm{Likelihood}(immun_{22}=1|kid2p_{22}, mom25p_{22}, ord23_{22}, ord46_{22}, ord7p_{22},\zeta_2)}{C}\]
The conditional prior is simply a normal density with variance set equal to its ML estimate, and we can evaluate this prior density at a set of points as follows:
# evaluate the normal density function at different values
zeta <- seq(-5, 10, by = .01)
# specify a normal density with variance set equal to its ML estimate
sqrt_psi<-sqrt(6.836)
prior <- dnorm(zeta,
mean = 0,
sd = sqrt_psi)
The likelihood function is logit\(^{-1}[\)linpred\(_j+\zeta_j]\), where linpred\(_j\) represents the “fixed” part of the linear predictor for this specific family, mom 2 (who had one child with \(y_{i2}=1\)). The “fixed” part of the linear predictor can be written as \[linpred_j=\hat\beta_1 + \hat\beta_2 kid2p_{ij} + \hat\beta_3 mom25p_{ij} +\hat\beta_4 ord23_{ij} +\hat\beta_5 ord46_{ij} +\hat\beta_6 ord7p_{ij}\]
linpred=-1.4937 + 1.6845*1 +-0.1298*0+-0.3140*0+-0.1798*0+-0.1017*0
likelihood <- invlogit(linpred+zeta)
It follows from Bayes theorem that the posterior distribution is proportional to the product of the likelihood and the prior. The normalizing constant is the marginal probability of the response (marginal over the random intercept), which can be written as \[C =\int \hat Pr(y_{ij}=1|mom25p_{ij}, ord23_{ij}, ord46_{ij}, ord7p_{ij}, \zeta_j)\phi(\zeta_j; 0, \hat\psi^{ML})d\zeta_j\] where \(\phi(\zeta_j; 0, \hat\psi^{ML})\) is the normal density of \(\zeta_j\) with a mean of 0 and a variance of \(\hat\psi^{ML}\). This can be obtained by Monte Carlo integration, sampling \(\zeta_j\) from its conditional prior distribution:
y <- rnorm(100000, sd = sqrt_psi)
normalizing.constant<- mean((invlogit(linpred+ y)))
Lastly, we calculate the product of the prior of the random intercept and the likelihood of the responses for the cluster (for mom 2’s child) given the random intercept and divide this product by the normalizing constant to obtain the posterior density:
EB_posterior <-prior*likelihood/normalizing.constant
Below, this conditional posterior is added to the plot. This conditional posterior density is not a normal density as in linear mixed models, and hence its mode does not equal its mean. Recall that the random intercept predictions obtained by using the ranef()
function are the modes (MAP) of this conditional posterior.
We can calculate the conditional mean (EAP) and the corresponding standard deviation by using the integrate()
function, as shown in the next R chunk.
# EB mean for mom 2: y = 1
x <- rnorm(10000000, sd = sqrt_psi)
normalizing.constant<- mean((invlogit((linpred+x))))
func <- function(x){
x*dnorm(x, sd = sqrt_psi)*(invlogit((linpred+x)))/normalizing.constant
}
integrate(func, lower = -Inf, upper = Inf)
1.653317 with absolute error < 4.1e-08
EX <- integrate(func, lower = -Inf, upper = Inf)[[1]]
func_sd <- function(x){
x^2*dnorm(x, sd = sqrt_psi)*(invlogit((linpred+x)) )/normalizing.constant
}
EX2 <- integrate(func_sd, lower = -Inf, upper = Inf)[[1]]
# conditional sd
EAP_sd <- sqrt(EX2 - EX^2)
To compare the different estimates, the code below adds the conditional EAP to the density plot as a dark blue-dotted vertical line and the corresponding conditional MAP as a lighter, blue-dashed vertical line. The figure shows that the black vertical line (FB_Mean) is further away from zero than the dark blue-dotted vertical line (conditional EAP) and the lighter, blue-dashed vertical line (conditional MAP). However, the distance between the black vertical line (FB_Mean) and the dark blue-dotted vertical line (conditional EAP) is in the same direction but much shorter than the distance between the black vertical line (FB_Mean) and the lighter, blue-dashed vertical line (conditional MAP). This suggests that the main reason for the discrepancy between FB posterior means and conditional MAP predictions is that the posteriors (and conditional posteriors) are asymmetric - skewed to the left for positive FB posterior means and to the right for negative ones. As mentioned, because the FB posterior standard deviation takes parameter uncertainty for the other parameters into account, whereas the conditional EAP and MAP standard errors are based on the conditional posteriors, the 95% credible interval for FB is broader than those for the conditional EAP and MAP, which is why in the figure the black error bar is wider than the blue one and the lighter blue one. Additionally, the conditional MAP standard error is smaller than the conditional EAP standard error because the former is based on the curvature of the conditional log posterior at the mode and is therefore not affected by the thicker tail on the right.
df2 <- data.frame(col1 =zeta, col2= EB_posterior)
p + geom_density(aes(FB_posterior_mom2), fill = "lightgray") +
geom_vline(aes(xintercept = mean(FB_posterior_mom2)), linetype = "solid", lwd = 1.5)+
geom_errorbarh(aes(xmin=mean(FB_posterior_mom2)-2*sd(FB_posterior_mom2), xmax =mean(FB_posterior_mom2)+2*sd(FB_posterior_mom2), y=0.03, colour=FB_Mean),colour="black", height = .005, lwd = 1.5)+
annotate(geom="errorbarh", xmin=6, xmax=6.5, y=0.2,height = .005,colour="black", size=1.5)+
annotate(geom="text", x=9, y=0.2, label="FB_mean", size=4)+
geom_line(data=df2, aes(zeta, EB_posterior), colour="blue")+
geom_vline(aes(xintercept = EX), linetype = "dotted", colour="blue", lwd = 1.5)+ # adds the EAP to the density plot as a dark blue-dotted vertical line
geom_errorbarh(aes(xmin=EX-2*EAP_sd, xmax =EX+2*EAP_sd, y=0.02, colour=EB_Mean), height = .005, colour="blue", lwd = 1.5)+
annotate(geom="errorbarh", xmin=6, xmax=6.5, y=0.19,height = .005,colour="blue", size=1.5)+
annotate(geom="text", x=9, y=0.19, label="EB_mean", size=4)+
geom_vline(aes(xintercept = a_df$X.Intercept.[1]), linetype = "dashed", colour="lightblue3", lwd = 1.5)+
geom_errorbarh(aes(xmin=a_df$X.Intercept.[1]-2*a_df$MAP_se[1], xmax=a_df$X.Intercept.[1]+2*a_df$MAP_se[1], y=0.01, colour=EB_Mode), height = .005, colour="lightblue3", lwd = 1.5)+
annotate(geom="errorbarh", xmin=6, xmax=6.5, y=0.18,height = .005,colour="lightblue3", size=1.5)+
annotate(geom="text", x=9, y=0.18, label="EB_mode", size=4)
glmer()
In addition to the child-level covariates that have been added to the first model, we continue to add a series of family-level covariates (namely \(ethnN_{j}\), \(ethnS_{j}\), \(momEdP_{j}\), \(momEdS_{j}\), \(husEdP_{j}\), \(husEdS_{j}\), \(husEdU_{j}\), and \(momWorkY_{j}\)) and cross-level interactions between the level-1 variable indicating children’s age and the level-2 variable indicating mothers’ education levels (\(kid2pY_{ij}\times momEdP_{j}\) and \(kid2pY_{ij}\times momEdS_{j}\)). Model 2 can be written as \[\textrm{logit}\{Pr(immun_{ij}=1|x_{ij}, \zeta_j)\} = \beta_1 + \beta_2 kid2p_{ij} + \beta_3 mom25p_{ij} +\beta_4 ord23_{ij} +\beta_5 ord46_{ij} +\beta_6 ord7p_{ij}+\\ \beta_7 ethnN_{j}+\beta_8 ethnS_{j}+\beta_9 momEdP_{j}+ \beta_{10}momEdS_{j}+ \beta_{11} husEdP_{j} +\beta_{12} husEdS_{j}+\beta_{13} husEdU_{j}+ \beta_{14} momWorkYes_{j}+\\ \beta_{15} kid2pYes_{ij}\times momEdP_{j}+ \beta_{16} kid2pYes_{ij}\times momEdS_{j} +\zeta_j\]
The random intercepts \(\zeta_j \sim N(0, \psi)\) are assumed to be independent of each other and identically distributed across families \(j\) and independent of the covariates in the model.
M2 <- glmer(immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +kid2p * momEd + (1 | mom),
family = binomial("logit"),
data = guImmun, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 10)
summary(M2)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
kid2p * momEd + (1 | mom)
Data: guImmun
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
2731.9 2828.4 -1348.9 2697.9 2142
Scaled residuals:
Min 1Q Median 3Q Max
-1.6535 -0.4250 -0.2222 0.4636 2.6119
Random effects:
Groups Name Variance Std.Dev.
mom (Intercept) 6.795 2.607
Number of obs: 2159, groups: mom, 1595
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.23632 0.42995 -5.201 1.98e-07 ***
kid2pY 1.45443 0.29729 4.892 9.97e-07 ***
mom25pY -0.18574 0.23791 -0.781 0.43497
ord23 -0.23704 0.23850 -0.994 0.32028
ord46 0.10312 0.30281 0.341 0.73344
ord7p 0.35860 0.38198 0.939 0.34784
ethnN -0.44176 0.30501 -1.448 0.14752
ethnS -0.36978 0.24345 -1.519 0.12879
momEdP 0.33879 0.38706 0.875 0.38141
momEdS -0.74728 0.80424 -0.929 0.35280
husEdP 0.56626 0.23493 2.410 0.01594 *
husEdS 0.69970 0.41438 1.689 0.09131 .
husEdU -0.05678 0.36983 -0.154 0.87798
momWorkY 0.55080 0.20448 2.694 0.00707 **
kid2pY:momEdP 0.32512 0.39488 0.823 0.41032
kid2pY:momEdS 2.05458 0.83985 2.446 0.01443 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation matrix not shown by default, as p = 16 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
From the output:
The Fixed Effects section displays the point estimates of regression coefficients with estimated standard errors and Wald tests. We can learn from the output which child-level and family-level covariates are statistically significantly associated with the child-level response variable, controlling for the other variables.
In the Random Effects section, the ML estimates provide the estimated standard deviation (\(\sqrt{\hat{\psi}}\approx2.61\)) of the random intercepts. We can learn from the random effects output about the extent to which between-family variability in the response variable is left unexplained by employing Model 2. We can also compare this estimate with the corresponding estimate obtained for Model 1 to quantify the extent to which between-family variability has been explained by the family-level covariates and the cross-level interaction added to the second model.
Then, we create the caterpillar plot below to show the distribution of the conditional modes.
lattice::dotplot(ranef(M2, which = "mom", condVar = TRUE), scales = list(y = list(alternating = 0)))
$mom
rstanarm
In this section, we use stan_glmer()
to estimate Model 2. Similar to Model 1, we use weakly informative prior distributions for the hyper-parameter \(\psi\).
M2_stanglmer <- stan_glmer(immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork + kid2p * momEd+(1 | mom),
family = binomial("logit"),
data = guImmun,
seed = 349)
glmer()
We have looked at the two-level logistic model with a random intercept in depth. In this section, we are going to briefly look at how to estimate a three-level logistic model with level-2 random intercepts, \(\zeta^{(2)}_{jk} \sim N(0, \psi^{(2)})\), for families \(j\) and level-3 random intercepts, \(\zeta^{(3)}_{k} \sim N(0, \psi^{(3)})\), for communities \(k\). In Model 3, children are nested within families, and families are nested in communities, so the levels are nested. The model can be written as \[\textrm{logit}\{Pr(immun_{ijk}=1| x, \zeta^{(2)}_{jk}, \zeta^{(3)}_{k}\} = \beta_1 + \beta_2 kid2p_{ijk} + ... +\beta_6 ord7p_{ijk}+\beta_7 ethnN_{jk}+...\\ + \beta_{14} momWorkYes_{jk}+\beta_{15} rural_{k}+\beta_{16} pcInd81_{k} +\zeta^{(2)}_{jk}+\zeta^{(3)}_{k}\]
In glmer
, we do not need to specify the data structure (nested or crossed) because glmer()
figures it out automatically based on the data. We use the same general syntax (1|ID)
to specify intercepts for the cluster identifier ID
, regardless of what other levels there are in the model. In addition to the child-level and family-level covariates, we go further to add two community-level covariates (\(rural_k\) and \(pcInd81_k\)). As mentioned, for a model with more than a single scalar random effect, glmer
only supports a single integration point, which can be specified by nAGQ = 1
and is equivalent to the Laplace approximation which may not be accurate. We can overcome this problem by fitting the model in rstanarm
(as shown in section 5.2).
M3 <- glmer(immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork + rural+ pcInd81 + (1 | mom) + (1 | comm),
family = binomial("logit"),
data = guImmun, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 1)
# print the model results without correlations among fixed effects
print(M3, digits = 2)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
rural + pcInd81 + (1 | mom) + (1 | comm)
Data: guImmun
AIC BIC logLik deviance df.resid
2747.402 2849.595 -1355.701 2711.402 2141
Random effects:
Groups Name Std.Dev.
mom (Intercept) 1.13
comm (Intercept) 0.72
Number of obs: 2159, groups: mom, 1595; comm, 161
Fixed Effects:
(Intercept) kid2pY mom25pY ord23 ord46 ord7p
-0.947 1.282 -0.128 -0.139 0.174 0.289
ethnN ethnS momEdP momEdS husEdP husEdS
-0.113 -0.035 0.295 0.302 0.395 0.369
husEdU momWorkY ruralY pcInd81
0.015 0.270 -0.649 -0.857
Similar to the glmer
output for Model 1 and Model 2, the output tells us about the family (which is binomial as our response variable is binary) and the link (which is logit). However, while the output for Models 1 and 2 showed only the estimated standard deviation of the random intercepts at the family level (\(\sqrt{\hat{\psi}}\)), the output of Model 3 shows both the between-family within-community standard deviation estimate (\(\sqrt{\hat{\psi}^{(2)}}= 1.13\)) and the between-community standard deviation estimate (\(\sqrt{\hat{\psi}^{(3)}}=0.72\)).
Now we can plot the conditional posterior modes of the random intercepts for either families or communities. By using the which=
option, we can choose between families (mom
) and communities (comm
).
# for families
lattice::dotplot(ranef(M3, which = "mom", condVar = TRUE), scales = list(y = list(alternating = 0)))
$mom
# for communities
lattice::dotplot(ranef(M3, which = "comm", condVar = TRUE), scales = list(y = list(alternating = 0)))
$comm
rstanarm
We use stan_glmer()
to fit Model 3 using the code below. We use the default priors and display the median and the median absolute deviation (MAD) of the posterior draws of the model parameters by using the print()
function.
M3_stanglmer <- stan_glmer(immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork + rural + pcInd81 + (1 | mom) + (1 | comm),
family = binomial("logit"),
data = guImmun,
seed = 349)
prior_summary(object = M3_stanglmer)
Priors for model 'M3_stanglmer'
------
Intercept (after predictors centered)
~ normal(location = 0, scale = 2.5)
Coefficients
Specified prior:
~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
Adjusted prior:
~ normal(location = [0,0,0,...], scale = [5.95,5.00,5.27,...])
Covariance
~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
print(M3_stanglmer, digits = 2)
stan_glmer
family: binomial [logit]
formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
rural + pcInd81 + (1 | mom) + (1 | comm)
observations: 2159
------
Median MAD_SD
(Intercept) -1.31 0.51
kid2pY 1.83 0.22
mom25pY -0.24 0.24
ord23 -0.29 0.24
ord46 0.18 0.31
ord7p 0.47 0.40
ethnN -0.22 0.53
ethnS -0.11 0.40
momEdP 0.47 0.23
momEdS 0.45 0.52
husEdP 0.57 0.25
husEdS 0.55 0.45
husEdU -0.02 0.38
momWorkY 0.42 0.22
ruralY -0.95 0.33
pcInd81 -1.21 0.54
Error terms:
Groups Name Std.Dev.
mom (Intercept) 2.61
comm (Intercept) 1.15
Num. levels: mom 1595, comm 161
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
glmer()
Finally, we show an example of a three-level random-coefficient model. Compared to random-intercept models, in which family-specific probability curves are shifted horizontally depending on the values of the random effects, the curves in random-coefficient models are no longer just shifted, but their slope varies depending on the value of the random coefficients. For example, the effect of whether a child was 2 years of age or older at the time of the survey (which means that they were eligible for the vaccine at the time of the intervention) on whether they had received a complete set of immunizations may vary across communities. Such variability can be modeled by adding the term \(\zeta^{(3)}_{2k}kid2pYes_{ijk}\) to Model 3 and renaming the community-level random intercepts \(\zeta^{(3)}_{k}\) to \(\zeta^{(3)}_{1k}\),which leads to the following random-coefficient model \[\textrm{logit}\{Pr(immun_{ijk}=1| x, \zeta^{(2)}_{jk}, \zeta^{(3)}_{k}\} = \beta_1 + \beta_2 kid2pYes_{ijk} + ... +\beta_6 ord7p_{ijk}+\beta_7 ethnN_{jk}+...\\ + \beta_{14} momWorkYes_{jk}+\beta_{15} rural_{k}+\beta_{16} pcInd81_{k} +\zeta^{(2)}_{jk}+\zeta^{(3)}_{2k}kid2pYes_{ijk}+\zeta^{(3)}_{1k}\]
Given the covariates Xs at all three levels, the random intercept at the family level has a normal distribution with mean zero and variance \(\psi^{(2)}\), \(\zeta^{(2)}_{jk} \sim N(0, \psi^{(2)})\). The random intercept and the random slope at the community-level, which are written as \(\zeta^{(3)}_{1k}\) and \(\zeta^{(3)}_{2k}\), respectively, are bivariate normal with zero means and an unstructured covariance matrix \[\boldsymbol\Psi^{(3)}= \begin{bmatrix} \psi^{3}_{11} &\psi^{3}_{12} \\ \psi^{3}_{21} &\psi^{3}_{22} \end{bmatrix} \equiv \\ \begin{bmatrix} Var(\zeta^{(3)}_{1k}|X_k) & Cov(\zeta^{(3)}_{1k}, \zeta^{(3)}_{2k}|X_k) \\ Cov(\zeta^{(3)}_{2k}, \zeta^{(3)}_{1k}|X_k) & Var(\zeta^{(3)}_{2k}|X_k) \\ \end{bmatrix}\] where \(\psi^{(3)}_{21}=\psi^{(3)}_{12}\).
In glmer()
, we use the + operator to “add” this random slope at the community level. The correlation matrix is not shown by default, but we can obtain it by specifying correlation=TRUE
in the print()
function.
M4 <- glmer(immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork + rural + pcInd81 + (1 | mom) + (1 + kid2p| comm),
family = binomial("logit"),
data = guImmun, control = glmerControl(optimizer = "bobyqa"),
nAGQ = 1)
summary(M4)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
rural + pcInd81 + (1 | mom) + (1 + kid2p | comm)
Data: guImmun
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
2747.6 2861.1 -1353.8 2707.6 2139
Scaled residuals:
Min 1Q Median 3Q Max
-1.7054 -0.6276 -0.3360 0.6823 2.3462
Random effects:
Groups Name Variance Std.Dev. Corr
mom (Intercept) 1.2835 1.1329
comm (Intercept) 0.9654 0.9825
kid2pY 0.5909 0.7687 -0.68
Number of obs: 2159, groups: mom, 1595; comm, 161
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.08578 0.36296 -2.991 0.00278 **
kid2pY 1.38297 0.19544 7.076 1.48e-12 ***
mom25pY -0.11687 0.16758 -0.697 0.48556
ord23 -0.12778 0.17609 -0.726 0.46805
ord46 0.18667 0.21642 0.863 0.38841
ord7p 0.29848 0.27034 1.104 0.26956
ethnN -0.08608 0.34048 -0.253 0.80040
ethnS -0.03566 0.25374 -0.141 0.88823
momEdP 0.30963 0.15429 2.007 0.04477 *
momEdS 0.33060 0.33921 0.975 0.32975
husEdP 0.40964 0.16128 2.540 0.01109 *
husEdS 0.37926 0.28800 1.317 0.18788
husEdU 0.02114 0.24728 0.085 0.93187
momWorkY 0.27872 0.14060 1.982 0.04744 *
ruralY -0.64758 0.21204 -3.054 0.00226 **
pcInd81 -0.86155 0.35048 -2.458 0.01396 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation matrix not shown by default, as p = 16 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
# print the mod results with correlation among fixed effects
print(M4, correlation=TRUE)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
rural + pcInd81 + (1 | mom) + (1 + kid2p | comm)
Data: guImmun
AIC BIC logLik deviance df.resid
2747.588 2861.136 -1353.794 2707.588 2139
Random effects:
Groups Name Std.Dev. Corr
mom (Intercept) 1.1329
comm (Intercept) 0.9825
kid2pY 0.7687 -0.68
Number of obs: 2159, groups: mom, 1595; comm, 161
Fixed Effects:
(Intercept) kid2pY mom25pY ord23 ord46 ord7p
-1.08578 1.38297 -0.11687 -0.12778 0.18667 0.29848
ethnN ethnS momEdP momEdS husEdP husEdS
-0.08608 -0.03566 0.30963 0.33060 0.40964 0.37926
husEdU momWorkY ruralY pcInd81
0.02114 0.27872 -0.64758 -0.86155
In the next R chunk, we produce a caterpillar plot of the conditional modes of the random intercepts for families and the random intercepts and slopes for communities.
# for families
lattice::dotplot(ranef(M4, which = "mom", condVar = TRUE), scales = list(y = list(alternating = 0)))
$mom
# for communities
lattice::dotplot(ranef(M4, which = "comm", condVar = TRUE), scales = list(y = list(alternating = 0)))
$comm
rstanarm
In this section, we use stan_glmer()
to fit Model 4. We mentioned default (weakly informative) priors in rstanarm
, which are designed to provide moderate regularization and help stabilize computation. More information on priors is available in the vignette Prior Distribution for rstanarm Models. We use the default prior distribution for parameters in M4_stanglmer
by not specifying any prior options in the stan_glmer()
function:
M4_stanglmer <- stan_glmer(immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork + rural + pcInd81 + (1 | mom) + (1 + kid2p| comm),
family = binomial("logit"),
data = guImmun,
seed = 349)
The way rstanarm
attempts to achieve weakly informative priors by default is to internally adjust the scales of the priors. We explain how this works below:
First, we can use the prior_summary() function to show how the specification of prior distribution works in the rstanarm package.
prior_summary(object = M4_stanglmer)
Priors for model 'M4_stanglmer'
------
Intercept (after predictors centered)
~ normal(location = 0, scale = 2.5)
Coefficients
Specified prior:
~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
Adjusted prior:
~ normal(location = [0,0,0,...], scale = [5.95,5.00,5.27,...])
Covariance
~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
Starting from the bottom part of the output:
prior_covariance
applies to covariance matrices in multilevel models with varying slopes and intercepts. In other words, it specifies a prior distribution for the unknown covariance matrices of the group-specific coefficients. Specifically, the covariance matrix is decomposed into correlation matrix and variance. The variance is in turn decomposed into the product of a simplex vector and the trace of the matrix. Finally, the trace is the product of the order of the matrix and the square of a scale parameter. The following diagram shows the decomposed components.library(DiagrammeR)
DiagrammeR::grViz("digraph {
graph[layout = dot, rankdir = LR]
'Covariance matrix'
'Correlation matrix'
'Variance'
'Simplex vector'
'The trace of the matrix'
'The order of the matrix'
'The square of a scale parameter'
'Covariance matrix' -> 'Correlation matrix'
'Covariance matrix' -> 'Variance'
'Variance' -> 'Simplex vector'
'Variance' -> 'The trace of the matrix'
'The trace of the matrix' -> 'The order of the matrix'
'The trace of the matrix' ->'The square of a scale parameter'
}")
Overall, this prior on a covariance matrix is represented by the decov()
function.
The prior for a correlation matrix is called LKJ, which is proportional to the determinant of the correlation matrix raised to the power of a positive regularization parameter minus one. The default reg. =1
indicates this prior is jointly uniform over all correlation matrices of that size.
For the simplex vector, a symmetric Dirchlet prior is used. The default conc. = 1
represents a single (positive) concentration
parameter and the prior is jointly uniform over the space of simplex vectors of that size.
The trace of a covariance matrix is equal to the sum of variances. We set the trace equal to the product of the order of the covariance matrix and the square of a positive scale parameter. The particular variances are set equal to the product of a simplex vector (which is non-negative and sums to 1) and the scalar trace. In other words, each element of the simplex vector represents the proportion of the trace attributable to the corresponding variable.
For the positive scale parameter, we use a scale-invariant prior distribution, which is a Gamma distribution in this case. The default shape = 1, scale = 1
implies a unit-exponential distribution.
For more details on prior distributions for the covariance matrix \(\boldsymbol\Psi^{(3)}\) in random-coefficient models, see the bottom part of the tutorial Intro to multilevel modeling using rstanarm and also the function explanation Prior distributions and options.
The default prior on a regression coefficient \(\beta_k\) is a normal distribution with zero mean. The default prior scale depends on the family of the prior being used, a normal distribution in this case, which is 2.5 by default when explanatory variables have been scaled to have a standard deviation of 0.5 (Gelman et al., 2008) (see Prior Distribution for rstanarm Models). In order for the default to be weakly informative, rstanarm adjusts the scales of the priors according to the standard deviations of the corresponding explanatory variables. As a result, the prior scales used were [5.95, 5.00, 5.27, …] respectively for the regression coefficients.
For the intercepts, the default prior is a normal distribution with zero mean and a standard deviation of 2.5. The note in parentheses informs us that the prior applies to the intercept after all predictors have been centered. In this case, we view the intercept as the value of the fixed part of the linear predictor when all covariates are equal to their means instead of zero, which makes the intercept more interpretable and meaningful.
To disable automatic rescaling, simply specify a prior other than the default, we can explicitly set the autoscale = FALSE
argument. In addition, we can specify flat (non-informative) priors by set prior=NULL
in the stan_glmer()
function. Users can also specify priors that reflect their prior information (see Prior distributions and options for more information).
print(M4_stanglmer, digits = 2)
stan_glmer
family: binomial [logit]
formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
rural + pcInd81 + (1 | mom) + (1 + kid2p | comm)
observations: 2159
------
Median MAD_SD
(Intercept) -1.54 0.61
kid2pY 2.04 0.31
mom25pY -0.21 0.26
ord23 -0.30 0.26
ord46 0.19 0.32
ord7p 0.49 0.42
ethnN -0.17 0.55
ethnS -0.11 0.40
momEdP 0.51 0.26
momEdS 0.51 0.57
husEdP 0.64 0.28
husEdS 0.62 0.48
husEdU 0.00 0.40
momWorkY 0.45 0.23
ruralY -1.00 0.35
pcInd81 -1.30 0.57
Error terms:
Groups Name Std.Dev. Corr
mom (Intercept) 2.78
comm (Intercept) 1.63
kid2pY 1.36 -0.65
Num. levels: mom 1595, comm 161
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
print(M4, digits = 2)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: immun ~ kid2p + mom25p + ord + ethn + momEd + husEd + momWork +
rural + pcInd81 + (1 | mom) + (1 + kid2p | comm)
Data: guImmun
AIC BIC logLik deviance df.resid
2747.588 2861.136 -1353.794 2707.588 2139
Random effects:
Groups Name Std.Dev. Corr
mom (Intercept) 1.13
comm (Intercept) 0.98
kid2pY 0.77 -0.68
Number of obs: 2159, groups: mom, 1595; comm, 161
Fixed Effects:
(Intercept) kid2pY mom25pY ord23 ord46 ord7p
-1.086 1.383 -0.117 -0.128 0.187 0.298
ethnN ethnS momEdP momEdS husEdP husEdS
-0.086 -0.036 0.310 0.331 0.410 0.379
husEdU momWorkY ruralY pcInd81
0.021 0.279 -0.648 -0.862
Comparing the FB estimates and ML estimates for Model 4, we notice that the FB estimates are further from zero than the ML estimates.
Lastly, in the next R chunk, we use Model 4 as an example to show how to display HTML tables for models estimated from glmer()
and stan_glmer
.
# Summary of mixed models as HTML table
library(sjPlot)
tab_model(M4)
immun | |||
---|---|---|---|
Predictors | Odds Ratios | CI | p |
(Intercept) | 0.34 | 0.17 – 0.69 | 0.003 |
kid2p [Y] | 3.99 | 2.72 – 5.85 | <0.001 |
mom25p [Y] | 0.89 | 0.64 – 1.24 | 0.486 |
ord [23] | 0.88 | 0.62 – 1.24 | 0.468 |
ord [46] | 1.21 | 0.79 – 1.84 | 0.388 |
ord [7p] | 1.35 | 0.79 – 2.29 | 0.270 |
ethn [N] | 0.92 | 0.47 – 1.79 | 0.800 |
ethn [S] | 0.96 | 0.59 – 1.59 | 0.888 |
momEd [P] | 1.36 | 1.01 – 1.84 | 0.045 |
momEd [S] | 1.39 | 0.72 – 2.71 | 0.330 |
husEd [P] | 1.51 | 1.10 – 2.07 | 0.011 |
husEd [S] | 1.46 | 0.83 – 2.57 | 0.188 |
husEd [U] | 1.02 | 0.63 – 1.66 | 0.932 |
momWork [Y] | 1.32 | 1.00 – 1.74 | 0.047 |
rural [Y] | 0.52 | 0.35 – 0.79 | 0.002 |
pcInd81 | 0.42 | 0.21 – 0.84 | 0.014 |
Random Effects | |||
σ2 | 3.29 | ||
τ00 mom | 1.28 | ||
τ00 comm | 0.97 | ||
τ11 comm.kid2pY | 0.59 | ||
ρ01 comm | -0.68 | ||
ICC | 0.37 | ||
N mom | 1595 | ||
N comm | 161 | ||
Observations | 2159 | ||
Marginal R2 / Conditional R2 | 0.129 / 0.450 |
# Tidying methods for an rstanarm model
library(broom.mixed)
# non-varying ("population") parameters
tidy(M4_stanglmer, conf.int = TRUE, prob = 0.5)
tidy(M4_stanglmer, conf.int = TRUE, conf.method = "HPDinterval", prob = 0.5)
# hierarchical sd & correlation parameters
tidy(M4_stanglmer, effects = "ran_pars")
# group-specific deviations from "population" parameters
tidy(M4_stanglmer, effects = "ran_vals")
# glance method
glance(M4_stanglmer)
In this tutorial, we illustrated how to fit multilevel logistic regression models within a fully Bayesian framework in rstanarm
. Rather than performing ML estimation using the glmer()
function in lme4
, we showed how to perform FB estimation (via the HMC approach) using the stan_glmer()
function. stan_glmer()
is easy to use as the syntax is similar to that of glmer()
. Several advantages of estimating these models using rstanarm
rather than the lme4
package have been demonstrated in this tutorial. Here is a summary of some major advantages:
rstanarm
provides better uncertainty estimates, than does the lme4
package by performing fully Bayesian inference.
For models with more than one random effect, the lme4
package can only use the Laplace approximation, which may perform poorly, especially for binary responses with small cluster sizes.
rstanarm
provides posterior draws of the random effects from the joint posterior, whereas the ranef()
function, after ML estimation with glmer()
, provides only the conditional posterior modes. Having posterior draws allows us to get
summary()
;print()
;There are a variety of data visualization strategies available for the rstanarm
package:
bayesplot
package provides a library of plotting functions to visualize the posterior draws;shinystan
package provides interactive diagnostics and posterior analyses based on the posterior draws.We can incorporate prior information by using rstanarm
. We showed how weakly informative priors are specified by prior automatic scale adjustments.
In sum, although fitting models by approximate ML in the lme4
package tends to be much faster than fitting the models in rstanarm
, the benefits of employing fully Bayesian inference via MCMC in rstanarm
outweigh the cost. We hope that this tutorial has provided you with enough information to test this out yourself and enough support to apply what you’ve learned in your own research.
library(knitcitations)
library(bibtex)
read.bibtex("MLR.bib")
[1] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. "Variational
Inference: A Review for Statisticians". In: _Journal of the American
Statistical Association_ 112.518 (Apr. 2017), pp. 859-877. ISSN:
1537-274X. DOI: 10.1080/01621459.2017.1285773. <URL:
http://dx.doi.org/10.1080/01621459.2017.1285773>.
[2] A. Gelman, A. Jakulin, M. G. Pittau, et al. "A Weakly Informative
Default Prior Distribution for Logistic and Other Regression Models".
In: _The Annals of Applied Statistics_ 2.4 (2008), pp. 1360-1383. ISSN:
19326157. <URL: http://www.jstor.org/stable/30245139>.
[3] H. Joe. "Accuracy of Laplace approximation for discrete response
mixed models". In: _Computational Statistics & Data Analysis_ 52.12
(2008), pp. 5066-5074. <URL:
https://EconPapers.repec.org/RePEc:eee:csdana:v:52:y:2008:i:12:p:5066-5074>.