gen_effects() is the Layer 1 entry point of the multisiteDGP pipeline —
it draws standardized site effects \(z_j\) from one of eight built-in
\(G\) distributions (or a user callback) and returns a forward-compatible
tibble that Layers 2 through 4 can consume. Most users invoke it indirectly
through sim_multisite or sim_meta; call it
directly when composing the four layers manually or auditing a single
layer in isolation. Shape selection is controlled by true_dist,
shape-specific parameters travel through theta_G, and a user callback
g_fn overrides the catalog when true_dist = "User".
Arguments
- J
Integer. Number of sites.
- true_dist
Character. One of
"Gaussian","StudentT","SkewN","ALD","Mixture","PointMassSlab","User", or"DPM". Default"Gaussian". Ifg_fnis supplied withouttrue_dist, the package auto-selects"User".- tau
Numeric. Grand mean on the response scale. Default
0.- sigma_tau
Numeric (\(\ge 0\)). Between-site standard deviation on the response scale (not variance). Default
0.20.- variance
Numeric. Legacy Gaussian variance argument. Default
1. The unit-variance convention requiresvariance = 1; other shapes ignore it.- theta_G
Named list of shape-specific parameters. Keys vary by
true_dist; see the eight-shape catalog above.- formula
One-sided formula for site-level covariates (e.g.,
~ x1 + x2), orNULL.- beta
Numeric coefficient vector matching the columns of the model matrix built from
formula, orNULL.- data
A
data.framecontaining the predictors named informula, orNULL.- g_fn
Optional user callback for
true_dist = "User"(or for the"DPM"bridge). ReceivesJand returns a length-Jnumeric vector.- g_returns
Character.
"standardized"(Convention A, default) — the callback returns standardized residuals \(z_j\) and the package rescales."raw"(Convention B) — the callback returns response-scale effects and the package does not rescale.- audit_g
Logical. When
g_returns = "standardized", validate that the callback draws meet the unit-moment contract. DefaultTRUE. Has no effect underg_returns = "raw".- upstream
Reserved for future layer composition. Leave
NULL(the default); passing a non-NULLvalue aborts.
Value
A tibble with one row per site:
site_indexInteger 1..J — preserved through downstream layers.
z_jStandardized residual effect — mean 0, variance 1 by construction.
tau_jResponse-scale latent effect, \(\tau + X_j\boldsymbol{\beta} + \sigma_\tau\,z_j\).
<covariate columns>Pass-through from
dataifformulawas non-NULL.latent_componentCharacter; for
true_dist = "Mixture", names which mixture component each row was drawn from. Absent for the other seven shapes.
The tibble carries no S3 class beyond tbl_df — Layer 2 functions add
the package's classes on top.
Details
The eight built-in \(G\) distributions are:
"Gaussian"(gen_effects_gaussian)Standard normal — the canonical baseline. No
theta_Gkeys."StudentT"(gen_effects_studentt)Standardized Student-\(t\) with degrees of freedom
theta_G$nu(numeric, > 2). Heavier tails than Gaussian."SkewN"(gen_effects_skewn)Standardized skew-normal with slant
theta_G$slant(numeric). Asymmetric shape."ALD"(gen_effects_ald)Standardized asymmetric Laplace with asymmetry
theta_G$rho\(\in (0, 1)\)."Mixture"(gen_effects_mixture)Two-component normal mixture with
theta_G$delta(component separation),theta_G$eps(mixing weight),theta_G$ups(variance ratio). Use for bimodal or contaminated effects."PointMassSlab"(gen_effects_pmslab)Point mass at 0 with probability
theta_G$pi0, plus a continuous slab governed bytheta_G$slab_shape,theta_G$mu_slab,theta_G$sigma_slab. Use when a fraction of sites have null effects."User"(gen_effects_user)Any user callback
g_fnreturning length-Jstandardized residuals (or raw response-scale effects underg_returns = "raw")."DPM"(gen_effects_dpm)Dirichlet-process mixture — currently available only via the
g_fncallback bridge. Direct DPM is unimplemented in the current release.
When to call this directly. For most users,
sim_multisite or sim_meta is the right entry
point — direct calls to gen_effects() are an advanced surface. The three
situations that warrant a direct call are: composing the four layers
manually to inspect or modify the Layer 1 → Layer 2 contract; auditing a
suspected downstream diagnostic by verifying Layer 1 in isolation; and
testing a g_fn callback's output before plugging it into the full
simulation.
Unit-variance convention. All eight shapes share a unit-variance
standardization: the package draws \(z_j\) with \(E[z_j] = 0\) and
\(\mathrm{Var}(z_j) = 1\), then rescales to
\(\tau_j = \tau + X_j\boldsymbol{\beta} + \sigma_\tau\,z_j\). This makes
sigma_tau a single comparable knob across shapes — heterogeneity targets
mean the same thing whether true_dist = "Gaussian" or
true_dist = "ALD".
Convention A vs Convention B (user callbacks). Under
g_returns = "standardized" (Convention A, the default) the callback
returns standardized residuals \(z_j\); the package rescales by
sigma_tau and adds tau (and any covariate adjustment) to form tau_j.
Under g_returns = "raw" (Convention B) the callback returns the
response-scale effect directly; the package leaves it untouched.
Convention A integrates with downstream diagnostics (notably
informativeness and heterogeneity_ratio)
without further work; Convention B is for callbacks where standardization
is meaningless or undesirable. See gen_effects_user.
Covariate adjustment. When formula is non-NULL, a model
matrix \(X\) is built from data and combined with beta to form
\(X_j\boldsymbol{\beta}\), which enters the linear predictor for
\(\tau_j\) additively. The covariate columns from data pass through
to the returned tibble so downstream layers can recover them.
For per-shape derivations and decision rubrics, see the
G-distribution catalog
and standardization vignette. For the g_fn callback contract, see the
Custom G distributions
vignette. For the formula / beta / data covariate surface, see the
Covariates and
precision dependence vignette.
References
Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .
See also
Shape-specific generators: gen_effects_gaussian,
gen_effects_studentt, gen_effects_skewn,
gen_effects_ald, gen_effects_mixture,
gen_effects_pmslab, gen_effects_user,
gen_effects_dpm.
Wrappers that compose all four layers: sim_multisite,
sim_meta.
The M2 G-distribution
catalog and
M5 Custom G
distributions vignettes.
Other family-effects:
gen_effects_ald(),
gen_effects_dpm(),
gen_effects_gaussian(),
gen_effects_mixture(),
gen_effects_pmslab(),
gen_effects_skewn(),
gen_effects_studentt(),
gen_effects_user()
Examples
# Gaussian (default — the canonical baseline).
gauss <- gen_effects(J = 10L, true_dist = "Gaussian", sigma_tau = 0.2)
head(gauss)
#> # A tibble: 6 × 3
#> site_index z_j tau_j
#> <int> <dbl> <dbl>
#> 1 1 0.233 0.0466
#> 2 2 0.0311 0.00621
#> 3 3 0.358 0.0716
#> 4 4 1.61 0.322
#> 5 5 1.43 0.286
#> 6 6 -0.948 -0.190
# Student-t with df = 5 — heavier tails for a robustness check.
studentt <- gen_effects(J = 50L, true_dist = "StudentT",
sigma_tau = 0.2, theta_G = list(nu = 5))
# Mixture: two-component bimodal effects.
mix <- gen_effects(J = 50L, true_dist = "Mixture",
sigma_tau = 0.2,
theta_G = list(delta = 1.0, eps = 0.2, ups = 2.0))
table(mix$latent_component)
#>
#> 1 2
#> 42 8
# Covariate-adjusted: tau_j = tau + 0.3 * x_j + sigma_tau * z_j.
sites <- data.frame(x = rnorm(20))
cov <- gen_effects(J = 20L, true_dist = "Gaussian",
formula = ~ x, beta = 0.3, data = sites,
sigma_tau = 0.15)
# User callback (Convention A — standardized residuals).
my_g <- function(J) rnorm(J)
user <- gen_effects(J = 50L, g_fn = my_g) # auto-selects true_dist = "User"