Skip to contents

Compute a stable content hash of a multisiteDGP simulation object — the hash that identifies whether two simulation runs produced bit-identical results. The hash is canonical: it normalizes column order, drops row names, selects only the stable diagnostics, and replaces callback functions with presence sentinels (so the hash is invariant under callback identity but sensitive to callback presence).

Usage

canonical_hash(
  x,
  algo = "xxhash64",
  columns_to_include = NULL,
  diagnostics_to_include = NULL
)

Arguments

x

A multisitedgp_data, multisitedgp_design, data frame, or other R object.

algo

Character. Hash algorithm passed to digest. Default "xxhash64" — fast, 16-hex output, suitable for typical reproducibility checks.

columns_to_include

Optional character vector of columns to include for data-frame-like objects. Columns are sorted before hashing. Default NULL (all canonical columns).

diagnostics_to_include

Optional character vector of diagnostic names to include. Default NULL (the blueprint's numeric-diagnostics allowlist).

Value

A single character hash string of length 16 (xxhash64) or 32 (xxhash32) etc.

Details

Cross-OS policy. Linux x86_64 / amd64 is the strict hash baseline for golden fixtures used in the package's regression tests. macOS and Windows are held to same-machine reproducibility and distributional parity rather than Linux byte-identical hashes — minor floating-point divergences across platforms are expected and do not indicate a bug. See system.file("REPRODUCIBILITY.md", package = "multisiteDGP") for the full installed policy.

Use cases. (1) Save the hash alongside a published simulation result so future readers can verify reproduction. (2) Pin a regression test fixture so unintended pipeline changes are caught. (3) Detect whether two parallel workers produced the same output.

For a worked reproducibility walkthrough see the Reproducibility and provenance vignette.

References

Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .

See also

provenance_string for the human-readable one-line provenance summary; the M7 Reproducibility and provenance vignette.

Other family-reproducibility: provenance_string()

Examples

dat <- sim_multisite(J = 10L, seed = 1L)
canonical_hash(dat)
#> [1] "f367529f6b9347bf"

# Same design / seed → same hash.
identical(canonical_hash(dat), canonical_hash(sim_multisite(J = 10L, seed = 1L)))
#> [1] TRUE