Simulate a direct-precision meta-analysis data-generating process

sim_meta() is the unified interface for the four-layer multisite-trial data-generating process under direct precision-scale targets. Given a meta-analytic design — site count $J$, latent-effect distribution, mean informativeness $I = \sigma_\tau^2 / (\sigma_\tau^2 + \mathrm{GM}(\widehat{se}_j^2))$, and heterogeneity ratio $R = \max \widehat{se}^2 / \min \widehat{se}^2$ — it composes the latent-effects, direct-precision-margin, dependence-alignment, and observation layers in one call and returns a multisitedgp_meta tibble with diagnostics and provenance attributes; the canonical hash is stored at attr(x, "provenance")$canonical_hash (not as a top-level attribute). sim_multisite is the sister site-size-driven (Paradigm A) interface; sim_meta() covers the direct-precision (Paradigm B) front door, where precision targets are supplied directly rather than derived from per-site sample sizes.

Usage

sim_meta(design = NULL, ..., seed = NULL)

Arguments

design: Optional multisitedgp_design with paradigm = "direct". If NULL, ... is forwarded to multisitedgp_design with paradigm = "direct" automatically locked. Construct a design once when reusing across multiple sim_meta() calls or a design_grid sweep.
...: Flat direct-precision design arguments used only when design = NULL. All must be named (positional ... is rejected). The key direct-path arguments are J (number of studies), I (mean informativeness target, $0 < I < 1$), R (heterogeneity ratio target, $R \ge 1$), tau (grand mean), sigma_tau (between-study SD), true_dist (one of "Gaussian", "StudentT", "SkewN", "ALD", "Mixture", "PointMassSlab", "User", "DPM"), shuffle, dependence, se_fn, se_args. See multisitedgp_design for the full list. Passing paradigm is an error (the wrapper locks it). Site-size arguments (nj_mean, cv, engine, n_per_site) trigger a coherence error pointing to sim_multisite.
seed: Optional integer seed override. When supplied, replaces design$seed and gives bit-identical reruns. Use a small integer (e.g. 1L) for examples; use a 9-digit integer in production for cross-run uniqueness.

Value

A multisitedgp_meta tibble (which inherits from multisitedgp_data) with one row per study and columns:

site_index: Integer study identifier $j = 1, \ldots, J$.
z_j: Standardized residual effect (mean 0, variance 1).
tau_j: Latent study-level effect on the response scale, $\tau + \sigma_\tau\,z_j$.
tau_j_hat: Observed study-level estimate $\widehat{\tau}_j$.
se_j, se2_j: Study-level SE and sampling variance $\widehat{se}_j^2$ from the direct grid (or se_fn output).
n_j: Always NA_integer_ — the direct-precision path has no site-size margin.

Plus the following attributes:

design: The locked multisitedgp_design with paradigm = "direct".
diagnostics: Group A / B / C / D diagnostics — I_hat, R_hat, realized Spearman / Pearson correlations, plus the meta-specific extras target_I, target_R, I_error, R_error (zero under the deterministic grid; NA_real_ under custom se_fn), I_exact, R_exact (logical exactness flags), shuffle, direct_se_method ("grid" or "custom"), direct_se_diagnostics, margin_engine. See compute_I.
provenance: Package version, R version, platform, resolved seed, canonical_hash, design_hash, and the call expression.
multisitedgp_version, paradigm: Convenience copies for quick attribute lookup.

Details

The simulation runs four generative layers in order:

Layer 1 — latent effects (gen_effects): Draws standardized site effects $z_j$ from one of eight built-in $G$ distributions and rescales to $\tau_j = \tau + \sigma_\tau\,z_j$.
Layer 2 — direct precision margin (gen_se_direct): Builds the per-site sampling variance $\widehat{se}_j^2$ as a deterministic grid that exactly hits the user-specified $(I, R)$ targets. The site-size column $n_j$ is NA_integer_ because no site-size margin is involved.
Layer 3 — precision dependence (align_rank_corr, align_copula_corr, align_hybrid_corr): Optionally aligns $\widehat{se}_j^2$ against $\tau_j$ to a target Spearman correlation, preserving both marginals exactly.
Layer 4 — observation draws (gen_observations): Draws $\widehat{\tau}_j \sim \mathcal{N}(\tau_j,\, \widehat{se}_j^2)$.

The direct-precision path is the formal counterpart of the site-size-driven path covered by sim_multisite: where the site-size path derives precision from sample sizes $n_j$, the direct-precision path supplies the precision targets and the package generates a deterministic SE grid to match exactly. Set shuffle = TRUE to randomize the SE-to-site assignment within the wrapped seed; the multiset of $\widehat{se}_j^2$ values is preserved, so the (I, R) targets remain exact under permutation but the assignment to sites changes. Supply a custom se_fn to plug in a non-grid SE distribution; when se_fn is non-NULL the deterministic-grid invariant no longer applies and the diagnostic I_error / R_error slots become NA_real_.

Site-size arguments (nj_mean, cv, nj_min, engine, n_per_site) are rejected with a coherence error pointing the caller to sim_multisite; they belong to Paradigm A, not the direct-precision path.

For a workflow walkthrough see the meta-analysis case study vignette. For the formal contract on direct-precision SE generation, see Margin and SE models — site-size and direct-precision paths.

RNG policy

If seed is NULL, the pipeline runs under the caller's active RNG state. If seed is a single integer, the full pipeline is wrapped in with_seed, so the caller's global RNG state is restored on exit. The direct SE grid is deterministic except for shuffle = TRUE with R > 1, which uses the active sample RNG inside the same wrapper seed.

References

Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .

Examples

# Minimal: a defensible meta-analysis preset, one call, read realized I.
dat <- sim_meta(design = preset_meta_modest(), seed = 1L)
attr(dat, "diagnostics")$I_hat
#> [1] 0.3

# Explicit direct-precision design via flat (J, I, R) targets.
dat <- sim_meta(J = 12L, I = 0.30, R = 2, sigma_tau = 0.20, seed = 1L)
summary(dat)
#> multisiteDGP simulation diagnostics
#> ------------------------------------------------------------
#> A. Realized vs Intended
#>    I (informativeness):         0.300  (target 0.300)  PASS  [rel=0.0%]
#>    R (SE heterogeneity):        2.000  (target 2.000)  PASS  [rel=0.0%]
#>    sigma_tau:                   0.162  (target 0.200)  FAIL  [rel=-18.9%]
#>    GM(se^2):                    0.093  (target 0.093)  PASS  [rel=-0.0%]
#> 
#> B. Dependence
#>    rank_corr residual:          0.063  (target 0.000)  PASS  [delta=0.063]
#>    rank_corr marginal:          0.063  (target N/A)  N/A   [residual target rows only; no finite target; status not assigned]
#>    pearson_corr residual:       0.219  (target 0.000)  FAIL  [delta=0.219]
#>    pearson_corr marginal:       0.219  (target N/A)  N/A   [residual target rows only; no finite target; status not assigned]
#> 
#> C. G shape fit
#>    KS distance D_J:             0.250  (target 0.000)  PASS  [p=0.869]
#>    Bhattacharyya BC:            0.333  (target 1.000)  FAIL  [rel=-66.7%]
#>    Q-Q residual:                0.896  (target 0.000)  N/A   [delta=0.896]
#> 
#> D. Operational feasibility
#>    mean shrinkage S:            0.302  (target N/A)  PASS  [no target]
#>    avg MOE (95%):               0.602  (target N/A)  WARN  [no target]
#>    feasibility_index:           3.624  (target N/A)  FAIL  [no target]
#> ------------------------------------------------------------
#> Overall: 6 PASS, 1 WARN, 4 FAIL.
#> Provenance: multisiteDGP 0.1.1 | paradigm=direct | seed=1 | canonical_hash=1fe5f5bf61f116dd | design_hash=02c80c06a86a2bed | hash_algo=xxhash64 | R=4.6.0 | hooks=none

# The grid generator is exact: I_error and R_error are zero.
diag <- attr(dat, "diagnostics")
diag$I_error  # 0
#> [1] 0
diag$R_error  # 0
#> [1] 0

if (FALSE) { # \dontrun{
  # Hand off to a meta-analytic estimator (requires {metafor}).
  # `as_metafor()` renames the canonical columns to metafor's (yi, vi, sei).
  metafor_obj <- as_metafor(dat)
  metafor::rma(yi = yi, vi = vi, data = metafor_obj)
} # }