Generate two-component Gaussian mixture latent site effects

Draw J standardized site effects from a two-component Gaussian mixture (the legacy JEBS parameterization) and apply the shared Layer 1 location-scale wrapper to produce \(\tau_j = \tau + X_j\boldsymbol{\beta} + \sigma_\tau\,z_j\). Reach for the mixture when you expect bimodal effects, contamination, or a subgroup of "outlier" sites whose effect distribution differs from the bulk.

Usage

gen_effects_mixture(
  J,
  tau = 0,
  sigma_tau = 0.2,
  delta,
  eps,
  ups,
  formula = NULL,
  beta = NULL,
  data = NULL
)

Arguments

J: Integer. Number of sites.
tau: Numeric. Grand mean on the response scale. Default 0.
sigma_tau: Numeric (\(\ge 0\)). Between-site standard deviation on the response scale. Default 0.20.
delta: Numeric (> 0). Component separation. Required — no default. Larger values produce more bimodal mixtures. Typical applied values: delta = 2 (mild bimodality), delta = 5 (clearly bimodal — the JEBS fixture).
eps: Numeric in (0, 1). Component-2 mixing weight. Required. eps = 0.3 puts 30% of sites in component 2; eps = 0.5 is balanced.
ups: Numeric (> 0). SD ratio \(\sigma_2 / \sigma_1\). Required. ups = 1 gives equal-spread components; ups = 2 gives a wider second component (the JEBS fixture).
formula: One-sided formula for site-level covariates, or NULL.
beta: Numeric coefficient vector matching formula, or NULL.
data: A data.frame with the predictors named in formula, or NULL.

Value

A tibble with one row per site and columns site_index (integer 1:J), z_j (unit-variance mixture residual), tau_j (response-scale effect), latent_component (integer 1 or 2 — which component each draw came from), plus any covariate columns from data.

Details

The mixture model is parameterized so that, before standardization, component 1 has mean \(-\epsilon\delta\) and SD 1 and component 2 has mean \((1 - \epsilon)\delta\) and SD ups, with mixing weight \((1 - \epsilon)\) on component 1 and \(\epsilon\) on component 2. This guarantees the unmixed expectation is zero. The total variance before standardization is \((1 - \epsilon) + \epsilon\,\mathrm{ups}^2 + \epsilon(1 - \epsilon)\delta^2\); the package divides each draw by the square root of that variance to produce unit-variance standardized residuals \(z_j\).

This is the parameterization used in the JEBS paper's mixture-shape fixtures; the parameter names (delta, eps, ups) match the JEBS notation. Because of that lock, the returned tibble carries an extra column latent_component (integer 1 or 2) recording which component each draw came from — useful for diagnostics and for matching realized draws against intended group memberships.

For the broader catalog and decision rubric, see the G-distribution catalog and standardization vignette.

References

Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .

Examples

# JEBS fixture: clearly bimodal, 30% in the wider second component.
mix <- gen_effects_mixture(J = 50L, delta = 5, eps = 0.3, ups = 2)
table(mix$latent_component)  # ~ 35 / 15 split
#> 
#>  1  2 
#> 33 17 

# Mild bimodality with equal-spread components.
gen_effects_mixture(J = 50L, delta = 2, eps = 0.5, ups = 1, sigma_tau = 0.15)
#> # A tibble: 50 × 4
#>    site_index      z_j    tau_j latent_component
#>         <int>    <dbl>    <dbl>            <int>
#>  1          1 -0.0811  -0.0122                 2
#>  2          2  1.84     0.276                  2
#>  3          3 -0.768   -0.115                  1
#>  4          4  1.81     0.272                  2
#>  5          5 -0.318   -0.0476                 2
#>  6          6 -0.132   -0.0198                 1
#>  7          7  0.350    0.0526                 2
#>  8          8  0.00846  0.00127                1
#>  9          9  1.39     0.209                  2
#> 10         10 -0.414   -0.0620                 1
#> # ℹ 40 more rows