Draw J integer site sizes \(n_j\) from a target mean and coefficient of
variation, compute the per-site Neyman sampling variance
\(\widehat{se}_j^2 = \kappa / n_j\), and append n_j, se2_j, and
se_j columns to an upstream Layer 1 frame. This is the Layer 2 margin
generator for the site-size-driven path (Paradigm A in the blueprint)
— call it directly when composing the four layers manually, or let
sim_multisite call it for you.
Usage
gen_site_sizes(
upstream,
J,
nj_mean = 50,
cv = 0.5,
nj_min = 5L,
p = 0.5,
R2 = 0,
var_outcome = 1,
engine = c("A2_modern", "A1_legacy")
)Arguments
- upstream
Data frame with exactly
Jrows. Typically the output ofgen_effects; must contain the canonical Layer 1 columnssite_index,z_j,tau_j. Layer 2 columns (n_j,se_j,se2_j) must NOT be present yet.- J
Integer. Number of sites — must equal
nrow(upstream).- nj_mean
Numeric (\(\ge \mathrm{nj\_min}\)). Target site-size mean on the engine scale. Default
50. Typical applied range: 20–500.- cv
Numeric (\(\ge 0\)). Target site-size coefficient of variation. Default
0.50. Usecv = 0for equal-size sites;cv = 0.5for the JEBS reference range. Largercvproduces more heterogeneous sites.- nj_min
Integer (\(\ge 1\)). Lower bound for public site sizes. Default
5. The engine output is floored at this bound.- p
Numeric in
(0, 1). Treatment-assignment proportion. Default0.5(balanced). Affects \(\kappa\) through Neyman allocation.- R2
Numeric in
[0, 1). Covariate-explained variance share at the site level. Default0. Decreases \(\kappa\) and improves precision through the multiplier \(1 - R^2\).- var_outcome
Numeric (> 0). Outcome variance. Default
1. Scales \(\kappa\) linearly.- engine
Character.
"A2_modern"(default — recommended) or"A1_legacy"(JEBS bit-parity reproduction only).
Value
The upstream tibble with three appended columns: n_j
(integer site size), se2_j (numeric sampling variance
\(\kappa / n_j\)), and se_j (numeric SE \(\sqrt{se2_j}\)). Two
attributes are attached: engine (the resolved engine name) and
kappa (the Neyman precision constant).
Details
Engine choice. Two engines back the site-size draw:
"A2_modern"(default — recommended for new work)Lower-truncated Gamma on the continuous target scale, then stochastic rounding to integer
n_j. Preserves the target mean in expectation and matchescvexactly on the underlying continuous draw."A1_legacy"The JEBS paper's censor-then-round procedure. Preserved for bit-identical reproduction of the JEBS reference design and its replication fixtures. Can inflate the empirical mean near
nj_minthrough censoring; not recommended for new work.
Pick A2 unless you are explicitly trying to reproduce a JEBS fixture. The
A1 engine is also restricted: combining A1 with non-trivial precision
dependence (dependence != "none") is refused by
validate_multisitedgp_design — A1 is for legacy
reproduction only.
Sampling variance. The per-site Neyman variance is
\(\kappa / n_j\) with
\(\kappa = \mathrm{var\_outcome}(1 - R^2) / (p (1 - p))\),
the standard Neyman-allocation precision constant. Pass p, R2, and
var_outcome to control \(\kappa\) explicitly; defaults
(p = 0.5, R2 = 0, var_outcome = 1) give \(\kappa = 4\), the
baseline used in the JEBS paper.
For the formal Paradigm A vs Paradigm B contrast and the engine derivation, see the Margin and SE models — site-size and direct-precision paths vignette.
RNG policy
Stochastic rounding (A2 only) consumes one runif() draw for each
non-integer engine output. All-integer engine output, including the
cv = 0 deterministic path, consumes no rounding RNG. The engine itself
(Gamma draw under A2 or A1) consumes the usual rgamma() stream.
References
Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .
See also
gen_se_direct for the direct-precision (Paradigm B)
counterpart that takes precision targets directly;
sim_multisite for the wrapper that calls this in the
four-layer pipeline;
compute_kappa for the underlying Neyman precision
constant;
the M3 Margin and SE
models vignette.
Other family-margins:
gen_se_direct()
Examples
# Compose Layer 1 + Layer 2 manually.
effects <- gen_effects_gaussian(J = 10L)
gen_site_sizes(effects, J = 10L, nj_mean = 40, cv = 0.2)
#> # A tibble: 10 × 6
#> site_index z_j tau_j n_j se2_j se_j
#> <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 1 -0.130 -0.0261 34 0.118 0.343
#> 2 2 0.951 0.190 37 0.108 0.329
#> 3 3 0.471 0.0942 34 0.118 0.343
#> 4 4 0.335 0.0670 45 0.0889 0.298
#> 5 5 -1.55 -0.310 37 0.108 0.329
#> 6 6 -0.621 -0.124 48 0.0833 0.289
#> 7 7 -1.62 -0.323 47 0.0851 0.292
#> 8 8 0.853 0.171 52 0.0769 0.277
#> 9 9 1.89 0.378 30 0.133 0.365
#> 10 10 -0.867 -0.173 44 0.0909 0.302
# Larger draw with the JEBS reference cv = 0.5 and Neyman defaults.
effects50 <- gen_effects_gaussian(J = 50L, sigma_tau = 0.15)
sized <- gen_site_sizes(effects50, J = 50L, nj_mean = 50, cv = 0.5)
summary(sized$n_j)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 10.00 29.50 49.00 49.96 69.75 109.00
summary(sized$se_j)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.1916 0.2395 0.2857 0.3198 0.3683 0.6325
# JEBS bit-parity reproduction — engine A1.
a1 <- gen_site_sizes(effects, J = 10L, nj_mean = 40, cv = 0.5,
engine = "A1_legacy")
attr(a1, "engine") # "A1_legacy"
#> [1] "A1_legacy"