Skip to contents

License: MIT

Data-generating processes for multisite trial simulations

You have a multisite trial to design, a meta-analysis to plan, or an estimator to stress-test. You need a defensible scenario you can take to a manuscript reviewer — citable provenance, realistic site-effect heterogeneity, plausible per-site sampling errors, and the dependence between effects and precisions that real trials exhibit. Rather than requiring you to assemble latent effects, sampling-error margins, and dependence structures from raw variance components, multisiteDGP allows specification through intuitive quantities — site counts, per-site sample sizes, a heterogeneity ratio, and a defensible preset.

Layered DGP

Four generative layers — latent effects, site-size margins, precision dependence, observation draws — with eight built-in distribution shapes and a single-call front door.

Get started

Defensible presets

Nine bundled scenarios — JEBS-paper, Walters-2024, Weiss-style education trials — each with a citation and a locked parameter set you can defend to a reviewer.

Choose a preset

Diagnostics

Read realized heterogeneity, effect-precision correlation, and distributional fit off a diagnostics attribute — verify the design behaves as intended before committing to a long simulation run.

Read diagnostics

The documentation runs on two tracks. Start with the Applied Track if you have a trial to design, a power calculation to deliver, or a case study to write. Start with the Methodological Track if you want the formal four-layer specification, the standardized-residual convention, and the contracts that govern every preset.

Installation

# Install from GitHub
# install.packages("devtools")
devtools::install_github("joonho112/multisiteDGP")

# Install from CRAN, when available
# install.packages("multisiteDGP")

Quick start

library(multisiteDGP)

# Verify a defensible scenario reproduces canonically.
# Scenario: 50-site education trial calibrated to the JEBS paper preset.
dat <- sim_multisite(preset_jebs_paper(), seed = 1L)

# Inspect the simulated dataset.
print(dat)
#> # A multisitedgp_data: 50 sites, paradigm = "site_size"
#> # Realized vs intended:
#> #   I: realized=0.250 (no target)
#> #   R: realized=7.583 (no target)
#> #   sigma_tau: target=0.200, realized=0.207, PASS
#> #   rho_S: target=0.000, realized=-0.193, PASS
#> #   Feasibility: WARN (n_eff=13.098)
#> # A tibble: 50 × 7
#>   site_index    z_j   tau_j tau_j_hat  se_j  se2_j   n_j
#>        <int>  <dbl>   <dbl>     <dbl> <dbl>  <dbl> <int>
#> 1          1 -0.582 -0.116    0.652   0.426 0.182     22
#> 2          2 -0.619 -0.124   -0.315   0.577 0.333     12
#> 3          3 -1.11  -0.222   -0.633   0.256 0.0656    61
#> # ℹ 47 more rows

# Read the realized informativeness off the diagnostics attribute.
attr(dat, "diagnostics")$I_hat
#> [1] 0.2500832

# Visualize the latent and observed site effects.
plot_effects(dat)

# Stamp the scenario for reproducibility — the hash is identical
# every time you rerun the same call.
canonical_hash(dat)
#> [1] "c52e75f276d82836"

In a single call you have a citable scenario, a printed dataset, a realized-target diagnostic, a publication-ready plot, and a stable provenance hash.

Vignettes

Applied Track

Vignette What you get
A1 · Getting started First simulation in 5 minutes
A2 · Choosing a preset Walk through the nine bundled presets
A3 · Diagnostics in practice Verify a design before you simulate
A4 · Covariates and dependence Inject effect-precision dependence
A5 · Calibrating to real data Match a simulation to a target study
A6 · Case study — multisite trial End-to-end education-trial example
A7 · Case study — meta-analysis End-to-end meta-analysis example
A8 · Cookbook Nine end-to-end recipes

Methodological Track

Vignette What you get
M1 · The two-stage DGP Mathematical framing of the four layers
M2 · G-distribution catalog The eight built-in shapes, side by side
M3 · Margin and SE models Site-size-driven and direct-precision paths
M4 · Precision dependence theory Rank, copula, and hybrid methods compared
M5 · Custom G distributions Bring your own latent distribution
M6 · Adapters and downstream Adapter contract and round-trip invariants
M7 · Reproducibility and provenance Seeds, hashes, and provenance strings
M8 · Migration from siteBayes2 Translate legacy scenario specs

Citation

Run citation("multisiteDGP") after install for the canonical software and JEBS-paper entries.

Support

This research was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D240078 to the University of Alabama. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

License

MIT © JoonHo Lee. See LICENSE.