Skip to contents

Permute the paired precision columns (se_j, se2_j, n_j) of a Layer 2 frame so the realized Spearman correlation between the standardized residual effect \(z_j\) and the sampling variance \(\widehat{se}_j^2\) approximates a target value rank_corr. The permutation preserves both marginals exactly — only the assignment of precision values to sites changes — and adds an attr(., "permutation_perm") integer vector recording the realized assignment.

Usage

align_rank_corr(
  upstream,
  rank_corr = 0,
  max_iter = 20000L,
  tol = 0.02,
  dependence_fn = NULL,
  ...
)

Arguments

upstream

A Layer 2 data frame with canonical columns site_index, z_j, tau_j, se_j, se2_j, n_j (typically the output of gen_site_sizes or gen_se_direct).

rank_corr

Numeric in [-1, 1]. Target Spearman correlation between z_j and se2_j. Default 0 (independence). Typical applied values: 0.3 (moderate positive — small sites tend to have larger effects), -0.5 (moderate negative — selection-on-effect-size in meta-analysis). |rank_corr| > 0.95 triggers a near-boundary warning.

max_iter

Integer (\(\ge 100\)). Maximum number of swap proposals. Default 20000L.

tol

Numeric (> 0). Absolute tolerance for the realized residual Spearman correlation. Default 0.02.

dependence_fn

Optional callback. See Details for the contract.

...

Additional arguments forwarded to dependence_fn.

Value

The upstream tibble with the paired precision columns (se_j, se2_j, n_j) permuted to approximate the target. Two attributes are attached: permutation_perm (length-J integer vector — the realized site-to-grid permutation) and dependence_diagnostics (a named list with method, target, realized_residual, realized_marginal, iterations, converged).

Details

Hill-climb algorithm. The package starts from the upstream (Layer 2) ordering of (se2_j, se_j, n_j) and proposes random pair swaps. A swap is accepted if it brings the realized residual Spearman correlation closer to rank_corr; otherwise it is rejected. The search continues until the realized correlation lands within tol of the target or until max_iter swaps have been proposed. The result is the best-effort permutation found within budget — for moderate \(|rank_{corr}| \le 0.7\) and max_iter = 20,000 (the default) the algorithm converges reliably; near \(|rank_{corr}| \approx 1\) the search may stall before convergence.

Why "exact marginal preservation" matters. Because the algorithm only reorders the existing values, the marginals of se2_j and \(z_j\) are bit-identical to their Layer 2 inputs. Diagnostics computed on the marginals (informativeness, heterogeneity ratio, shrinkage) are unchanged by Layer 3; only the joint distribution and rank correlation move. If you need to introduce dependence by shifting marginals, use align_copula_corr (Gaussian copula) or align_hybrid_corr (the recommended default, which combines both).

Custom dependence_fn extensibility. Pass a dependence_fn callback to bypass the built-in hill-climb. The callback receives z_j, se2_j, target, and ..., and must return list(se2_j = <length-J numeric>, perm = <length-J integer>). The returned se2_j must be a permutation of the upstream se2_j multiset. The callback owns its own RNG.

For the formal contrast among the three injection methods (rank, copula, hybrid) and a decision rubric on when to choose each, see the Precision dependence — three injection methods vignette.

RNG policy

Built-in hill-climb proposes swaps via sample.int() and accepts / rejects deterministically based on the resulting correlation. The active sample.int() RNG is consumed; under a wrapper seed (sim_multisite() / sim_meta() with seed), runs are bit-identical. Custom dependence_fn callbacks own their own RNG.

References

Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731–764. doi:10.3102/10769986241254286 .

See also

align_copula_corr for the Gaussian-copula alternative; align_hybrid_corr for the recommended default that combines copula initialization with hill-climb polish; realized_rank_corr for the realized Spearman after alignment; sim_multisite for the wrapper that calls this in the four-layer pipeline; the M4 Precision dependence theory vignette.

Other family-dependence: align_copula_corr(), align_hybrid_corr()

Examples

# Compose Layer 1 + 2 + 3 manually with a moderate positive target.
effects <- gen_effects_gaussian(J = 12L)
margins <- gen_site_sizes(effects, J = 12L, nj_mean = 40, cv = 0.2)
aligned <- align_rank_corr(margins, rank_corr = 0.3, max_iter = 1000L)
cor(aligned$z_j, aligned$se2_j, method = "spearman")
#> [1] 0.3017562

# Negative target — common in meta-analytic selection-on-precision scenarios.
aligned_neg <- align_rank_corr(margins, rank_corr = -0.5)
attr(aligned_neg, "dependence_diagnostics")
#> $method
#> [1] "rank"
#> 
#> $target_type
#> [1] "residual_spearman"
#> 
#> $target
#> [1] -0.5
#> 
#> $achieved
#> [1] -0.5193014
#> 
#> $residual
#> [1] -0.01930144
#> 
#> $converged
#> [1] TRUE
#> 
#> $iterations
#> [1] 1
#> 
#> $tol
#> [1] 0.02
#>