Generate direct sampling variances from informativeness and heterogeneity targets

Build a deterministic grid of per-site sampling variances \(\widehat{se}_j^2\) that exactly hits user-specified informativeness \(I = \sigma_\tau^2 / (\sigma_\tau^2 + \mathrm{GM}(\widehat{se}_j^2))\) and heterogeneity-ratio \(R = \max \widehat{se}^2 / \min \widehat{se}^2\) targets, then append n_j = NA_integer_, se2_j, and se_j columns to an upstream Layer 1 frame. This is the Layer 2 margin generator for the direct-precision path (Paradigm B in the blueprint) — call it directly when composing the four layers manually, or let sim_meta call it for you.

Usage

gen_se_direct(
  upstream,
  J,
  I,
  R = 1,
  shuffle = TRUE,
  sigma_tau = 0.2,
  se_fn = NULL,
  se_args = list()
)

Arguments

upstream: Data frame with exactly J rows. Typically the output of gen_effects; must contain the canonical Layer 1 columns site_index, z_j, tau_j. Layer 2 columns must NOT be present yet.
J: Integer. Number of sites — must equal nrow(upstream).
I: Numeric in (0, 1). Target mean informativeness. Required. Typical values: 0.10 (low precision), 0.30 (modest), 0.50 (Lord-Wallace), 0.80 (high). Endpoints are degenerate.
R: Numeric (\(\ge 1\)). Target heterogeneity ratio \(\widehat{se}^2_{\max} / \widehat{se}^2_{\min}\). Default 1 (homogeneous precision). Larger values widen the precision spread.
shuffle: Logical. If TRUE (default), randomly permute the deterministic SE grid; R = 1 is unaffected. The (I, R) targets remain exact under permutation.
sigma_tau: Numeric (\(\ge 0\)). Between-site standard deviation on the response scale. Default 0.20. Used to back-compute \(\bar{\widehat{se}^2} = \sigma_\tau^2 (1 - I)/I\).
se_fn: Optional callback function(J, ...) returning a named list with at least se2_j. When non-NULL, replaces the deterministic grid; the (I, R) targets are no longer guaranteed exact.
se_args: Named list forwarded to se_fn as extra arguments after J. Default list().

Value

The upstream tibble with three appended columns: n_j (always NA_integer_ — no site-size margin under the direct path), se2_j (numeric sampling variance), se_j (numeric SE \(\sqrt{se2_j}\)). Attributes attached: engine ("paradigm_B_deterministic" or "paradigm_B_custom"), I, R, direct_se_diagnostics.

Details

Under the default deterministic-grid mode (se_fn = NULL), the returned SE values exactly hit the targets: the geometric mean of \(\widehat{se}_j^2\) is \(\sigma_\tau^2 (1 - I)/I\) and the max/min ratio equals R. Setting shuffle = TRUE randomly permutes the assignment of grid values to sites so the multiset of SE values is preserved (the targets remain exact under permutation) but the order changes — useful before downstream rank-correlation alignment.

Custom se_fn extensibility. Supply se_fn(J, ...) returning a named list with at least se2_j (length-J) and optionally n_j to replace the deterministic grid with a user-supplied SE distribution. Under custom-se_fn the deterministic-target invariant no longer applies — the realized I and R are reported in the diagnostics but not constrained to match the inputs.

For the formal Paradigm A vs Paradigm B contrast and the grid derivation, see the Margin and SE models — site-size and direct-precision paths vignette.

RNG policy

shuffle = TRUE uses R's active sample() / sample.int() RNG policy when R > 1; fixed-seed permutations therefore require the same R sampling-kind policy. The homogeneous R = 1 path and the shuffle = FALSE path consume no RNG.

References

Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .

Examples

# Compose Layer 1 + Layer 2 (direct-precision) manually.
effects <- gen_effects_gaussian(J = 10L)
gen_se_direct(effects, J = 10L, I = 0.30, R = 2, shuffle = FALSE)
#> # A tibble: 10 × 6
#>    site_index     z_j    tau_j   n_j  se2_j  se_j
#>         <int>   <dbl>    <dbl> <int>  <dbl> <dbl>
#>  1          1 -1.83   -0.367      NA 0.0660 0.257
#>  2          2  1.41    0.282      NA 0.0713 0.267
#>  3          3  0.380   0.0760     NA 0.0770 0.277
#>  4          4 -0.0157 -0.00313    NA 0.0832 0.288
#>  5          5 -1.57   -0.315      NA 0.0898 0.300
#>  6          6 -0.353  -0.0705     NA 0.0970 0.311
#>  7          7  0.622   0.124      NA 0.105  0.324
#>  8          8  0.413   0.0827     NA 0.113  0.336
#>  9          9 -0.0992 -0.0198     NA 0.122  0.350
#> 10         10 -0.196  -0.0392     NA 0.132  0.363

# Larger draw with R = 4 — wide precision spread.
effects50 <- gen_effects_gaussian(J = 50L, sigma_tau = 0.15)
direct <- gen_se_direct(effects50, J = 50L, I = 0.5, R = 4)
summary(direct$se_j)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.1414  0.1682  0.2000  0.2042  0.2378  0.2828 
attr(direct, "engine")  # "paradigm_B_deterministic"
#> [1] "paradigm_B_deterministic"