Generate direct sampling variances from informativeness and heterogeneity targets
Source:R/layer2-gen_se_direct.R
gen_se_direct.RdBuild a deterministic grid of per-site sampling variances
\(\widehat{se}_j^2\) that exactly hits user-specified informativeness
\(I = \sigma_\tau^2 / (\sigma_\tau^2 + \mathrm{GM}(\widehat{se}_j^2))\)
and heterogeneity-ratio
\(R = \max \widehat{se}^2 / \min \widehat{se}^2\) targets, then append
n_j = NA_integer_, se2_j, and se_j columns to an upstream Layer 1
frame. This is the Layer 2 margin generator for the direct-precision
path (Paradigm B in the blueprint) — call it directly when composing
the four layers manually, or let sim_meta call it for you.
Usage
gen_se_direct(
upstream,
J,
I,
R = 1,
shuffle = TRUE,
sigma_tau = 0.2,
se_fn = NULL,
se_args = list()
)Arguments
- upstream
Data frame with exactly
Jrows. Typically the output ofgen_effects; must contain the canonical Layer 1 columnssite_index,z_j,tau_j. Layer 2 columns must NOT be present yet.- J
Integer. Number of sites — must equal
nrow(upstream).- I
Numeric in
(0, 1). Target mean informativeness. Required. Typical values:0.10(low precision),0.30(modest),0.50(Lord-Wallace),0.80(high). Endpoints are degenerate.- R
Numeric (\(\ge 1\)). Target heterogeneity ratio \(\widehat{se}^2_{\max} / \widehat{se}^2_{\min}\). Default
1(homogeneous precision). Larger values widen the precision spread.- shuffle
Logical. If
TRUE(default), randomly permute the deterministic SE grid;R = 1is unaffected. The(I, R)targets remain exact under permutation.- sigma_tau
Numeric (\(\ge 0\)). Between-site standard deviation on the response scale. Default
0.20. Used to back-compute \(\bar{\widehat{se}^2} = \sigma_\tau^2 (1 - I)/I\).- se_fn
Optional callback
function(J, ...)returning a named list with at leastse2_j. When non-NULL, replaces the deterministic grid; the(I, R)targets are no longer guaranteed exact.- se_args
Named list forwarded to
se_fnas extra arguments afterJ. Defaultlist().
Value
The upstream tibble with three appended columns: n_j (always
NA_integer_ — no site-size margin under the direct path), se2_j
(numeric sampling variance), se_j (numeric SE \(\sqrt{se2_j}\)).
Attributes attached: engine ("paradigm_B_deterministic" or
"paradigm_B_custom"), I, R, direct_se_diagnostics.
Details
Under the default deterministic-grid mode (se_fn = NULL), the returned
SE values exactly hit the targets: the geometric mean of \(\widehat{se}_j^2\)
is \(\sigma_\tau^2 (1 - I)/I\) and the max/min ratio equals R.
Setting shuffle = TRUE randomly permutes the assignment of grid values
to sites so the multiset of SE values is preserved (the targets remain
exact under permutation) but the order changes — useful before
downstream rank-correlation alignment.
Custom se_fn extensibility. Supply se_fn(J, ...) returning a
named list with at least se2_j (length-J) and optionally n_j to
replace the deterministic grid with a user-supplied SE distribution.
Under custom-se_fn the deterministic-target invariant no longer
applies — the realized I and R are reported in the diagnostics but
not constrained to match the inputs.
For the formal Paradigm A vs Paradigm B contrast and the grid derivation, see the Margin and SE models — site-size and direct-precision paths vignette.
RNG policy
shuffle = TRUE uses R's active sample() / sample.int() RNG policy
when R > 1; fixed-seed permutations therefore require the same R
sampling-kind policy. The homogeneous R = 1 path and the
shuffle = FALSE path consume no RNG.
References
Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .
See also
gen_site_sizes for the site-size-driven (Paradigm A)
counterpart that builds SE from sample sizes;
sim_meta for the wrapper that calls this in the
four-layer pipeline;
compute_I and informativeness for reading
the realized informativeness from the result;
the M3 Margin and SE
models vignette.
Other family-margins:
gen_site_sizes()
Examples
# Compose Layer 1 + Layer 2 (direct-precision) manually.
effects <- gen_effects_gaussian(J = 10L)
gen_se_direct(effects, J = 10L, I = 0.30, R = 2, shuffle = FALSE)
#> # A tibble: 10 × 6
#> site_index z_j tau_j n_j se2_j se_j
#> <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 1 -1.83 -0.367 NA 0.0660 0.257
#> 2 2 1.41 0.282 NA 0.0713 0.267
#> 3 3 0.380 0.0760 NA 0.0770 0.277
#> 4 4 -0.0157 -0.00313 NA 0.0832 0.288
#> 5 5 -1.57 -0.315 NA 0.0898 0.300
#> 6 6 -0.353 -0.0705 NA 0.0970 0.311
#> 7 7 0.622 0.124 NA 0.105 0.324
#> 8 8 0.413 0.0827 NA 0.113 0.336
#> 9 9 -0.0992 -0.0198 NA 0.122 0.350
#> 10 10 -0.196 -0.0392 NA 0.132 0.363
# Larger draw with R = 4 — wide precision spread.
effects50 <- gen_effects_gaussian(J = 50L, sigma_tau = 0.15)
direct <- gen_se_direct(effects50, J = 50L, I = 0.5, R = 4)
summary(direct$se_j)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.1414 0.1682 0.2000 0.2042 0.2378 0.2828
attr(direct, "engine") # "paradigm_B_deterministic"
#> [1] "paradigm_B_deterministic"