Skip to contents

Draw observed site-level estimates \(\widehat{\tau}_j \sim \mathcal{N}(\tau_j,\, \widehat{se}_j^2)\) from the upstream Layer 2 / Layer 3 frame and append them as a new tau_j_hat column. This is the final layer of the four-layer pipeline — it is conceptually simple (one Gaussian draw per site, conditional on the latent effect and sampling variance) but the obs_fn extensibility hook lets you swap in a non-Gaussian observation model when needed (e.g., a Poisson-rate observation, a heavy-tailed alternative, or a bootstrap from real data).

Usage

gen_observations(upstream, obs_fn = NULL, ...)

Arguments

upstream

A Layer 2 or Layer 3 data frame with the canonical columns site_index, z_j, tau_j, n_j, se2_j, se_j. Typically the output of gen_site_sizes, gen_se_direct, or one of the Layer 3 dependence aligners.

obs_fn

Optional callback. If supplied, called as obs_fn(tau_j = upstream$tau_j, se2_j = upstream$se2_j, ...) and must return a finite numeric vector of length J.

...

Additional arguments forwarded to obs_fn.

Value

The upstream tibble with one appended numeric column, tau_j_hat. An attribute observation_diagnostics is attached, with entries method ("gaussian" or "custom"), legacy_a1_shuffle (logical), and rng_draws (integer or NA).

Details

Default Gaussian path. For each site, draw \(\widehat{\tau}_j \sim \mathcal{N}(\tau_j, \widehat{se}_j^2)\) using base R's rnorm(). This matches the standard Stage-1 sampling assumption in the JEBS paper and most multisite-trial / meta-analysis literature.

Custom obs_fn extensibility. Pass a function with signature obs_fn(tau_j, se2_j, ...) returning a finite numeric vector of length J. Use this to swap in non-Gaussian observation models — Poisson-rate, Student-t with heavy tails, or a non-parametric bootstrap from a real dataset. The callback owns its own RNG.

Legacy A1 shuffle. When upstream was produced by gen_site_sizes with engine = "A1_legacy" and the default Gaussian path is used, the function first applies the JEBS-paper observation-stage precision shuffle (one sample.int(J) call) before drawing the Gaussian observations. This preserves bit-identical reproduction of the JEBS reference design. Custom obs_fn callbacks do NOT trigger the legacy shuffle (they're assumed to be modern code).

RNG policy

The default Gaussian path consumes exactly one rnorm() draw per site. Under engine A1 with the default observation path, one extra sample.int(J) call is consumed for the legacy shuffle (before the rnorm() draws). User-supplied obs_fn hooks own their own RNG behavior and do not trigger the legacy shuffle. All RNG is wrapped by the caller's seed when invoked through sim_multisite / sim_meta.

References

Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .

See also

sim_multisite and sim_meta for the wrappers that compose this in the four-layer pipeline; gen_site_sizes and gen_se_direct for the upstream Layer 2 generators; align_hybrid_corr and the other Layer 3 aligners for the precision-dependence step that typically precedes this.

Examples

# Default Gaussian observation path.
effects <- gen_effects_gaussian(J = 10L)
margins <- gen_site_sizes(effects, J = 10L, nj_mean = 40, cv = 0.2)
obs <- gen_observations(margins)
obs[, c("site_index", "tau_j", "tau_j_hat", "se_j")]
#> # A tibble: 10 × 4
#>    site_index    tau_j tau_j_hat  se_j
#>         <int>    <dbl>     <dbl> <dbl>
#>  1          1 -0.391     -0.194  0.289
#>  2          2  0.133     -0.0792 0.348
#>  3          3 -0.107     -0.216  0.312
#>  4          4  0.224      0.597  0.329
#>  5          5  0.279      0.700  0.275
#>  6          6  0.0224     0.324  0.272
#>  7          7  0.0526     0.163  0.320
#>  8          8  0.201     -0.406  0.298
#>  9          9 -0.00924   -0.176  0.338
#> 10         10 -0.130     -0.150  0.329

# Custom obs_fn — heavy-tailed Student-t observation noise.
heavy_obs <- function(tau_j, se2_j, df = 5) {
  se_j <- sqrt(se2_j)
  tau_j + se_j * stats::rt(length(tau_j), df = df) * sqrt((df - 2) / df)
}
obs_t <- gen_observations(margins, obs_fn = heavy_obs, df = 5)
attr(obs_t, "observation_diagnostics")$method  # "custom"
#> [1] "custom"