Simulate a direct-precision meta-analysis data-generating process
Source:R/wrapper-sim_meta.R
sim_meta.Rdsim_meta() is the unified interface for the four-layer multisite-trial
data-generating process under direct precision-scale targets. Given a
meta-analytic design — site count \(J\), latent-effect distribution,
mean informativeness
\(I = \sigma_\tau^2 / (\sigma_\tau^2 + \mathrm{GM}(\widehat{se}_j^2))\),
and heterogeneity ratio \(R = \max \widehat{se}^2 / \min \widehat{se}^2\)
— it composes the latent-effects, direct-precision-margin,
dependence-alignment, and observation layers in one call and returns a
multisitedgp_meta tibble with diagnostics and provenance
attributes; the canonical hash is stored at
attr(x, "provenance")$canonical_hash (not as a top-level attribute).
sim_multisite is the sister site-size-driven
(Paradigm A) interface; sim_meta() covers the direct-precision
(Paradigm B) front door, where precision targets are supplied directly
rather than derived from per-site sample sizes.
Arguments
- design
Optional
multisitedgp_designwithparadigm = "direct". IfNULL,...is forwarded tomultisitedgp_designwithparadigm = "direct"automatically locked. Construct a design once when reusing across multiplesim_meta()calls or adesign_gridsweep.- ...
Flat direct-precision design arguments used only when
design = NULL. All must be named (positional...is rejected). The key direct-path arguments areJ(number of studies),I(mean informativeness target, \(0 < I < 1\)),R(heterogeneity ratio target, \(R \ge 1\)),tau(grand mean),sigma_tau(between-study SD),true_dist(one of"Gaussian","StudentT","SkewN","ALD","Mixture","PointMassSlab","User","DPM"),shuffle,dependence,se_fn,se_args. Seemultisitedgp_designfor the full list. Passingparadigmis an error (the wrapper locks it). Site-size arguments (nj_mean,cv,engine,n_per_site) trigger a coherence error pointing tosim_multisite.- seed
Optional integer seed override. When supplied, replaces
design$seedand gives bit-identical reruns. Use a small integer (e.g.1L) for examples; use a 9-digit integer in production for cross-run uniqueness.
Value
A multisitedgp_meta tibble (which inherits from
multisitedgp_data) with one row per study and columns:
site_indexInteger study identifier \(j = 1, \ldots, J\).
z_jStandardized residual effect (mean 0, variance 1).
tau_jLatent study-level effect on the response scale, \(\tau + \sigma_\tau\,z_j\).
tau_j_hatObserved study-level estimate \(\widehat{\tau}_j\).
se_j,se2_jStudy-level SE and sampling variance \(\widehat{se}_j^2\) from the direct grid (or
se_fnoutput).n_jAlways
NA_integer_— the direct-precision path has no site-size margin.
Plus the following attributes:
designThe locked
multisitedgp_designwithparadigm = "direct".diagnosticsGroup A / B / C / D diagnostics —
I_hat,R_hat, realized Spearman / Pearson correlations, plus the meta-specific extrastarget_I,target_R,I_error,R_error(zero under the deterministic grid;NA_real_under customse_fn),I_exact,R_exact(logical exactness flags),shuffle,direct_se_method("grid"or"custom"),direct_se_diagnostics,margin_engine. Seecompute_I.provenancePackage version, R version, platform, resolved seed,
canonical_hash,design_hash, and the call expression.multisitedgp_version,paradigmConvenience copies for quick attribute lookup.
Details
The simulation runs four generative layers in order:
- Layer 1 — latent effects (
gen_effects) Draws standardized site effects \(z_j\) from one of eight built-in \(G\) distributions and rescales to \(\tau_j = \tau + \sigma_\tau\,z_j\).
- Layer 2 — direct precision margin (
gen_se_direct) Builds the per-site sampling variance \(\widehat{se}_j^2\) as a deterministic grid that exactly hits the user-specified \((I, R)\) targets. The site-size column \(n_j\) is
NA_integer_because no site-size margin is involved.- Layer 3 — precision dependence (
align_rank_corr,align_copula_corr,align_hybrid_corr) Optionally aligns \(\widehat{se}_j^2\) against \(\tau_j\) to a target Spearman correlation, preserving both marginals exactly.
- Layer 4 — observation draws (
gen_observations) Draws \(\widehat{\tau}_j \sim \mathcal{N}(\tau_j,\, \widehat{se}_j^2)\).
The direct-precision path is the formal counterpart of the site-size-driven
path covered by sim_multisite: where the site-size path
derives precision from sample sizes \(n_j\), the direct-precision path
supplies the precision targets and the package generates a deterministic
SE grid to match exactly. Set shuffle = TRUE to randomize the SE-to-site
assignment within the wrapped seed; the multiset of \(\widehat{se}_j^2\)
values is preserved, so the (I, R) targets remain exact under
permutation but the assignment to sites changes. Supply a custom se_fn
to plug in a non-grid SE distribution; when se_fn is non-NULL the
deterministic-grid invariant no longer applies and the diagnostic
I_error / R_error slots become NA_real_.
Site-size arguments (nj_mean, cv, nj_min, engine, n_per_site)
are rejected with a coherence error pointing the caller to
sim_multisite; they belong to Paradigm A, not the
direct-precision path.
For a workflow walkthrough see the meta-analysis case study vignette. For the formal contract on direct-precision SE generation, see Margin and SE models — site-size and direct-precision paths.
RNG policy
If seed is NULL, the pipeline runs under the caller's active RNG state.
If seed is a single integer, the full pipeline is wrapped in
with_seed, so the caller's global RNG state is
restored on exit. The direct SE grid is deterministic except for
shuffle = TRUE with R > 1, which uses the active sample RNG inside
the same wrapper seed.
References
Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .
See also
sim_multisite for the site-size-driven (Paradigm A)
sister wrapper;
multisitedgp_design for explicit design construction and
validation;
preset_meta_modest for a defensible meta-analysis
starting design;
gen_se_direct for the underlying Layer 2 direct-precision
generator;
design_grid for scenario-grid sweeps;
the meta-analysis
case study vignette.
Other family-wrappers:
sim_multisite()
Examples
# Minimal: a defensible meta-analysis preset, one call, read realized I.
dat <- sim_meta(design = preset_meta_modest(), seed = 1L)
attr(dat, "diagnostics")$I_hat
#> [1] 0.3
# Explicit direct-precision design via flat (J, I, R) targets.
dat <- sim_meta(J = 12L, I = 0.30, R = 2, sigma_tau = 0.20, seed = 1L)
summary(dat)
#> multisiteDGP simulation diagnostics
#> ------------------------------------------------------------
#> A. Realized vs Intended
#> I (informativeness): 0.300 (target 0.300) PASS [rel=0.0%]
#> R (SE heterogeneity): 2.000 (target 2.000) PASS [rel=0.0%]
#> sigma_tau: 0.162 (target 0.200) FAIL [rel=-18.9%]
#> GM(se^2): 0.093 (target 0.093) PASS [rel=-0.0%]
#>
#> B. Dependence
#> rank_corr residual: 0.063 (target 0.000) PASS [delta=0.063]
#> rank_corr marginal: 0.063 (target N/A) N/A [residual target rows only; no finite target; status not assigned]
#> pearson_corr residual: 0.219 (target 0.000) FAIL [delta=0.219]
#> pearson_corr marginal: 0.219 (target N/A) N/A [residual target rows only; no finite target; status not assigned]
#>
#> C. G shape fit
#> KS distance D_J: 0.250 (target 0.000) PASS [p=0.869]
#> Bhattacharyya BC: 0.333 (target 1.000) FAIL [rel=-66.7%]
#> Q-Q residual: 0.896 (target 0.000) N/A [delta=0.896]
#>
#> D. Operational feasibility
#> mean shrinkage S: 0.302 (target N/A) PASS [no target]
#> avg MOE (95%): 0.602 (target N/A) WARN [no target]
#> feasibility_index: 3.624 (target N/A) FAIL [no target]
#> ------------------------------------------------------------
#> Overall: 6 PASS, 1 WARN, 4 FAIL.
#> Provenance: multisiteDGP 0.1.1 | paradigm=direct | seed=1 | canonical_hash=1fe5f5bf61f116dd | design_hash=02c80c06a86a2bed | hash_algo=xxhash64 | R=4.6.0 | hooks=none
# The grid generator is exact: I_error and R_error are zero.
diag <- attr(dat, "diagnostics")
diag$I_error # 0
#> [1] 0
diag$R_error # 0
#> [1] 0
if (FALSE) { # \dontrun{
# Hand off to a meta-analytic estimator (requires {metafor}).
# `as_metafor()` renames the canonical columns to metafor's (yi, vi, sei).
metafor_obj <- as_metafor(dat)
metafor::rma(yi = yi, vi = vi, data = metafor_obj)
} # }