Generate observed site-level effect estimates
Source:R/layer4-gen_observations.R
gen_observations.RdDraw observed site-level estimates
\(\widehat{\tau}_j \sim \mathcal{N}(\tau_j,\, \widehat{se}_j^2)\) from
the upstream Layer 2 / Layer 3 frame and append them as a new
tau_j_hat column. This is the final layer of the four-layer pipeline
— it is conceptually simple (one Gaussian draw per site, conditional on
the latent effect and sampling variance) but the obs_fn extensibility
hook lets you swap in a non-Gaussian observation model when needed
(e.g., a Poisson-rate observation, a heavy-tailed alternative, or a
bootstrap from real data).
Arguments
- upstream
A Layer 2 or Layer 3 data frame with the canonical columns
site_index, z_j, tau_j, n_j, se2_j, se_j. Typically the output ofgen_site_sizes,gen_se_direct, or one of the Layer 3 dependence aligners.- obs_fn
Optional callback. If supplied, called as
obs_fn(tau_j = upstream$tau_j, se2_j = upstream$se2_j, ...)and must return a finite numeric vector of lengthJ.- ...
Additional arguments forwarded to
obs_fn.
Value
The upstream tibble with one appended numeric column,
tau_j_hat. An attribute observation_diagnostics is attached, with
entries method ("gaussian" or "custom"), legacy_a1_shuffle
(logical), and rng_draws (integer or NA).
Details
Default Gaussian path. For each site, draw
\(\widehat{\tau}_j \sim \mathcal{N}(\tau_j, \widehat{se}_j^2)\) using
base R's rnorm(). This matches the standard Stage-1 sampling
assumption in the JEBS paper and most multisite-trial / meta-analysis
literature.
Custom obs_fn extensibility. Pass a function with signature
obs_fn(tau_j, se2_j, ...) returning a finite numeric vector of length
J. Use this to swap in non-Gaussian observation models —
Poisson-rate, Student-t with heavy tails, or a non-parametric bootstrap
from a real dataset. The callback owns its own RNG.
Legacy A1 shuffle. When upstream was produced by
gen_site_sizes with engine = "A1_legacy" and the default
Gaussian path is used, the function first applies the JEBS-paper
observation-stage precision shuffle (one sample.int(J) call) before
drawing the Gaussian observations. This preserves bit-identical
reproduction of the JEBS reference design. Custom obs_fn callbacks do
NOT trigger the legacy shuffle (they're assumed to be modern code).
RNG policy
The default Gaussian path consumes exactly one rnorm() draw per site.
Under engine A1 with the default observation path, one extra
sample.int(J) call is consumed for the legacy shuffle (before the
rnorm() draws). User-supplied obs_fn hooks own their own RNG
behavior and do not trigger the legacy shuffle. All RNG is wrapped by
the caller's seed when invoked through sim_multisite /
sim_meta.
References
Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .
See also
sim_multisite and sim_meta for the
wrappers that compose this in the four-layer pipeline;
gen_site_sizes and gen_se_direct for
the upstream Layer 2 generators;
align_hybrid_corr and the other Layer 3 aligners for
the precision-dependence step that typically precedes this.
Examples
# Default Gaussian observation path.
effects <- gen_effects_gaussian(J = 10L)
margins <- gen_site_sizes(effects, J = 10L, nj_mean = 40, cv = 0.2)
obs <- gen_observations(margins)
obs[, c("site_index", "tau_j", "tau_j_hat", "se_j")]
#> # A tibble: 10 × 4
#> site_index tau_j tau_j_hat se_j
#> <int> <dbl> <dbl> <dbl>
#> 1 1 -0.391 -0.194 0.289
#> 2 2 0.133 -0.0792 0.348
#> 3 3 -0.107 -0.216 0.312
#> 4 4 0.224 0.597 0.329
#> 5 5 0.279 0.700 0.275
#> 6 6 0.0224 0.324 0.272
#> 7 7 0.0526 0.163 0.320
#> 8 8 0.201 -0.406 0.298
#> 9 9 -0.00924 -0.176 0.338
#> 10 10 -0.130 -0.150 0.329
# Custom obs_fn — heavy-tailed Student-t observation noise.
heavy_obs <- function(tau_j, se2_j, df = 5) {
se_j <- sqrt(se2_j)
tau_j + se_j * stats::rt(length(tau_j), df = df) * sqrt((df - 2) / df)
}
obs_t <- gen_observations(margins, obs_fn = heavy_obs, df = 5)
attr(obs_t, "observation_diagnostics")$method # "custom"
#> [1] "custom"