Complete API Reference with Examples
JoonHo Lee
2026-01-01
Source:vignettes/api-reference.Rmd
api-reference.RmdOverview
This vignette provides a complete API reference for the DPprior package. All exported functions are documented with their full signatures, parameter descriptions, return values, and usage examples.
The package functions are organized into six categories:
| Category | Description | Key Functions |
|---|---|---|
| Core Elicitation | Main user-facing calibration functions |
DPprior_fit(), DPprior_dual()
|
| Exact Computation | Stirling numbers and exact distributions |
compute_log_stirling(),
pmf_K_given_alpha()
|
| Approximation | Closed-form A1 and Newton-based A2 methods |
DPprior_a1(), DPprior_a2_newton()
|
| Weight Distribution | First stick-breaking weight and co-clustering |
mean_w1(), prob_w1_exceeds()
|
| Diagnostics | Prior validation and dominance risk assessment |
DPprior_diagnostics(), summary()
|
| Utility | Helper functions for conversions and integration |
vif_to_variance(), integrate_gamma()
|
1. Core Elicitation Functions
These are the primary user-facing functions for eliciting Gamma hyperpriors on the DP concentration parameter .
1.1 DPprior_fit()
Unified Interface for Prior Elicitation
The main entry point for eliciting a Gamma hyperprior. Automatically selects the appropriate algorithm based on input specification.
DPprior_fit(
J, # Sample size (required)
mu_K, # Target E[K_J] (required)
var_K = NULL, # Target Var(K_J) (optional)
confidence = NULL, # "low", "medium", "high" (optional)
method = "A2-MN", # "A1", "A2-MN", "A2-KL"
check_diagnostics = FALSE,
warn_dominance = TRUE,
M = 80L, # Quadrature nodes
verbose = FALSE
)Parameters:
| Parameter | Type | Description |
|---|---|---|
J |
Integer | Sample size (number of observations/sites). Must be . |
mu_K |
Numeric | Target prior mean . Must satisfy . |
var_K |
Numeric | Target prior variance
.
If NULL, computed from confidence. |
confidence |
Character | Qualitative uncertainty: "low" (VIF=4),
"medium" (VIF=2.5), "high" (VIF=1.5). |
method |
Character | Algorithm: "A1" (closed-form), "A2-MN"
(Newton), "A2-KL" (KL minimization). |
check_diagnostics |
Logical | If TRUE, compute full diagnostics. |
warn_dominance |
Logical | If TRUE, warn when
. |
M |
Integer | Number of Gauss-Laguerre quadrature nodes. |
verbose |
Logical | Print iteration progress. |
Returns: An S3 object of class
DPprior_fit with components:
| Component | Description |
|---|---|
a |
Optimal Gamma shape parameter |
b |
Optimal Gamma rate parameter |
J |
Sample size |
target |
List with mu_K, var_K,
var_K_used
|
fit |
Achieved mu_K, var_K, and
residual
|
method |
Algorithm used |
converged |
Logical convergence indicator |
iterations |
Number of iterations (0 for A1) |
diagnostics |
Optional diagnostic information |
Examples:
# Basic usage with confidence level
fit1 <- DPprior_fit(J = 50, mu_K = 5, confidence = "medium")
#> Warning: HIGH DOMINANCE RISK: P(w1 > 0.5) = 49.7% exceeds 40%.
#> This may indicate unintended prior behavior (RN-07).
#> Consider using DPprior_dual() for weight-constrained elicitation.
#> See ?DPprior_diagnostics for interpretation.
print(fit1)
#> DPprior Prior Elicitation Result
#> =============================================
#>
#> Gamma Hyperprior: α ~ Gamma(a = 1.4082, b = 1.0770)
#> E[α] = 1.308, SD[α] = 1.102
#>
#> Target (J = 50):
#> E[K_J] = 5.00
#> Var(K_J) = 10.00
#> (from confidence = 'medium')
#>
#> Achieved:
#> E[K_J] = 5.000000, Var(K_J) = 10.000000
#> Residual = 3.94e-10
#>
#> Method: A2-MN (7 iterations)
#>
#> Dominance Risk: HIGH ✘ (P(w₁>0.5) = 50%)
# Direct variance specification
fit2 <- DPprior_fit(J = 50, mu_K = 5, var_K = 10)
#> Warning: HIGH DOMINANCE RISK: P(w1 > 0.5) = 49.7% exceeds 40%.
#> This may indicate unintended prior behavior (RN-07).
#> Consider using DPprior_dual() for weight-constrained elicitation.
#> See ?DPprior_diagnostics for interpretation.
cat("Gamma(", round(fit2$a, 4), ", ", round(fit2$b, 4), ")\n", sep = "")
#> Gamma(1.4082, 1.077)
# Using A1 closed-form (faster for large J)
fit3 <- DPprior_fit(J = 200, mu_K = 15, var_K = 30, method = "A1")
# With diagnostics
fit4 <- DPprior_fit(J = 50, mu_K = 5, var_K = 8, check_diagnostics = TRUE)
#> Warning: HIGH DOMINANCE RISK: P(w1 > 0.5) = 48.1% exceeds 40%.
#> This may indicate unintended prior behavior (RN-07).
#> Consider using DPprior_dual() for weight-constrained elicitation.
#> See ?DPprior_diagnostics for interpretation.1.2 DPprior_dual()
Dual-Anchor Elicitation with Weight Control
Extends moment-based elicitation to incorporate constraints on the stick-breaking weight distribution, addressing the “unintended prior” problem identified in RN-07.
DPprior_dual(
fit_K, # K-only DPprior_fit object (required)
w1_target, # Weight target specification (required)
lambda = 0.7, # Trade-off parameter [0, 1]
loss_type = "adaptive", # "absolute", "relative", "adaptive"
method = "L-BFGS-B", # Optimization method
max_iter = 100L,
tol = 1e-6,
M = 80L,
verbose = FALSE
)Parameters:
| Parameter | Type | Description |
|---|---|---|
fit_K |
DPprior_fit | Initial K-only fit from DPprior_fit() or
DPprior_a2_newton(). |
w1_target |
List | Weight target: list(prob = list(threshold, value)) for
. |
lambda |
Numeric | Trade-off: 0 = pure weight, 1 = pure K. Default: 0.7. |
loss_type |
Character | Loss function type for weight anchor. |
method |
Character | Optimization algorithm for optim(). |
max_iter |
Integer | Maximum optimization iterations. |
tol |
Numeric | Convergence tolerance. |
Returns: A DPprior_fit object with
additional dual_anchor component containing:
| Component | Description |
|---|---|
lambda |
Trade-off parameter used |
w1_target |
Original weight target |
w1_achieved |
Achieved weight statistics |
K_achieved |
Achieved K moments |
init |
Initial K-only parameters |
Examples:
# Step 1: Fit K-only prior
fit_K <- DPprior_fit(J = 50, mu_K = 5, var_K = 8)
#> Warning: HIGH DOMINANCE RISK: P(w1 > 0.5) = 48.1% exceeds 40%.
#> This may indicate unintended prior behavior (RN-07).
#> Consider using DPprior_dual() for weight-constrained elicitation.
#> See ?DPprior_diagnostics for interpretation.
cat("K-only: P(w1 > 0.5) =",
round(prob_w1_exceeds(0.5, fit_K$a, fit_K$b), 3), "\n")
#> K-only: P(w1 > 0.5) = 0.481
# Step 2: Apply weight constraint
w1_target <- list(prob = list(threshold = 0.5, value = 0.30))
fit_dual <- DPprior_dual(fit_K, w1_target, lambda = 0.5)
cat("Dual: P(w1 > 0.5) =",
round(prob_w1_exceeds(0.5, fit_dual$a, fit_dual$b), 3), "\n")
#> Dual: P(w1 > 0.5) = 0.438
# Compare parameters
cat("\nParameter comparison:\n")
#>
#> Parameter comparison:
cat(" K-only: Gamma(", round(fit_K$a, 3), ", ", round(fit_K$b, 3), ")\n", sep = "")
#> K-only: Gamma(2.036, 1.605)
cat(" Dual: Gamma(", round(fit_dual$a, 3), ", ", round(fit_dual$b, 3), ")\n", sep = "")
#> Dual: Gamma(2.575, 1.834)2. Exact Computation Functions
These functions provide exact computation of Stirling numbers, PMFs, and moments using numerically stable log-space arithmetic.
2.1 Stirling Number Functions
compute_log_stirling()
Pre-compute Log Stirling Numbers
Computes the logarithm of unsigned Stirling numbers of the first kind
for all
from 0 to J_max and
from 0 to
.
compute_log_stirling(J_max)Parameters:
| Parameter | Type | Description |
|---|---|---|
J_max |
Integer | Maximum sample size. Must be . |
Returns: A lower triangular matrix of dimension
.
Entry [J+1, k+1] contains
(using R’s 1-based indexing).
Examples:
# Pre-compute for J up to 100
logS <- compute_log_stirling(100)
# Access |s(10,3)| = 9450
s_10_3 <- exp(logS[11, 4])
cat("|s(10,3)| =", round(s_10_3), "\n")
#> |s(10,3)| = 1172700
# Verify row sum identity: sum_k |s(J,k)| = J!
J <- 6
row_sum <- sum(exp(logS[J+1, 2:(J+1)]))
cat("sum |s(6,k)| =", round(row_sum), ", 6! =", factorial(6), "\n")
#> sum |s(6,k)| = 720 , 6! = 720
get_log_stirling()
Safe Accessor with Bounds Checking
get_log_stirling(J, k, logS)Returns
with automatic bounds checking. Returns -Inf for invalid
indices
(
or
).
get_stirling_row()
Extract Row of Stirling Numbers
get_stirling_row(J, logS)Returns a vector of for .
2.2 Conditional PMF Functions
pmf_K_given_alpha()
Antoniak Distribution PMF
Computes the exact conditional PMF using the Antoniak distribution formula:
pmf_K_given_alpha(J, alpha, logS, normalize = TRUE)Parameters:
| Parameter | Type | Description |
|---|---|---|
J |
Integer | Sample size. |
alpha |
Numeric | Concentration parameter (scalar, ). |
logS |
Matrix | Pre-computed log-Stirling matrix. |
normalize |
Logical | If TRUE, ensure PMF sums to 1. |
Returns: Numeric vector of length containing for . Note: always.
Examples:
J <- 50
alpha <- 2.0
logS <- compute_log_stirling(J)
# Compute conditional PMF
pmf <- pmf_K_given_alpha(J, alpha, logS)
# Find mode
mode_k <- which.max(pmf) - 1
cat("Mode of K|α=2:", mode_k, "\n")
#> Mode of K|α=2: 7
# Verify normalization
cat("PMF sum:", sum(pmf), "\n")
#> PMF sum: 1
cdf_K_given_alpha()
Conditional CDF
cdf_K_given_alpha(J, alpha, logS)Returns the cumulative distribution function .
quantile_K_given_alpha()
Conditional Quantile Function
quantile_K_given_alpha(p, J, alpha, logS)Returns the smallest such that .
2.3 Conditional Moment Functions
mean_K_given_alpha()
Conditional Mean via Digamma
Computes using the digamma function.
mean_K_given_alpha(J, alpha)Parameters:
| Parameter | Type | Description |
|---|---|---|
J |
Integer | Sample size (). |
alpha |
Numeric | Concentration parameter (vectorized). |
Returns: Numeric vector of conditional means.
Examples:
J <- 50
# Single alpha
mean_K_given_alpha(J, 2.0)
#> [1] 7.037626
# Vectorized
alpha_seq <- c(0.5, 1, 2, 5, 10)
means <- mean_K_given_alpha(J, alpha_seq)
data.frame(alpha = alpha_seq, E_K = round(means, 2))
#> alpha E_K
#> 1 0.5 2.94
#> 2 1.0 4.50
#> 3 2.0 7.04
#> 4 5.0 12.46
#> 5 10.0 18.34
var_K_given_alpha()
Conditional Variance via Trigamma
Computes .
var_K_given_alpha(J, alpha)Key Property: Conditional underdispersion always holds:
Examples:
J <- 50
alpha <- 2.0
mu <- mean_K_given_alpha(J, alpha)
sigma2 <- var_K_given_alpha(J, alpha)
cat("E[K|α=2] =", round(mu, 4), "\n")
#> E[K|α=2] = 7.0376
cat("Var(K|α=2) =", round(sigma2, 4), "\n")
#> Var(K|α=2) = 4.5356
cat("Underdispersion ratio:", round(sigma2/mu, 4), "< 1 ✓\n")
#> Underdispersion ratio: 0.6445 < 1 ✓
moments_K_given_alpha()
Both Moments in One Call
moments_K_given_alpha(J, alpha)Returns a list with mean and var.
2.4 Marginal Distribution Functions
exact_K_moments()
Marginal Moments under Gamma Hyperprior
Computes and when using Gauss-Laguerre quadrature.
exact_K_moments(J, a, b, M = 80L)Parameters:
| Parameter | Type | Description |
|---|---|---|
J |
Integer | Sample size. |
a |
Numeric | Gamma shape parameter. |
b |
Numeric | Gamma rate parameter. |
M |
Integer | Number of quadrature nodes. |
Returns: A list with components:
| Component | Description |
|---|---|
mean |
Marginal mean |
var |
Marginal variance |
sd |
Standard deviation |
cv |
Coefficient of variation |
Examples:
pmf_K_marginal()
Marginal PMF via Quadrature Mixture
pmf_K_marginal(J, a, b, logS, M = 80L)Computes the marginal PMF by mixing conditional PMFs over :
Examples:
J <- 50
logS <- compute_log_stirling(J)
pmf_marginal <- pmf_K_marginal(J, a = 1.6, b = 1.22, logS)
# Find mode
mode_k <- which.max(pmf_marginal) - 1
cat("Mode of marginal K:", mode_k, "\n")
#> Mode of marginal K: 3
summary_K_marginal()
Complete Marginal Distribution Summary
summary_K_marginal(J, a, b, logS, M = 80L, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))Returns mean, variance, mode, median, quantiles, PMF, and CDF.
3. Approximation Functions
3.1 DPprior_a1()
A1 Closed-Form Approximation
Fast closed-form solution using the negative binomial approximation to the marginal distribution of .
DPprior_a1(
J, # Sample size
mu_K, # Target mean
var_K = NULL, # Target variance
confidence = NULL, # "low", "medium", "high"
scaling = "log", # "log", "harmonic", "digamma"
project_to_feasible = TRUE
)Key Formulas:
where is the scaling constant (default: ).
Feasibility Constraint: Requires .
Examples:
# Basic A1 fit
fit_a1 <- DPprior_a1(J = 50, mu_K = 5, var_K = 8)
cat("A1 solution: Gamma(", round(fit_a1$a, 4), ", ",
round(fit_a1$b, 4), ")\n", sep = "")
#> A1 solution: Gamma(4, 3.912)
# Compare scaling methods
for (s in c("log", "harmonic", "digamma")) {
fit <- DPprior_a1(50, 5, 8, scaling = s)
cat(sprintf(" %s: cJ = %.4f\n", s, fit$cJ))
}
#> log: cJ = 3.9120
#> harmonic: cJ = 4.4792
#> digamma: cJ = 4.46333.2 DPprior_a2_newton()
A2-MN Newton Solver for Exact Moment Matching
Finds such that the induced marginal moments exactly match targets.
DPprior_a2_newton(
J, # Sample size
mu_K, # Target mean
var_K, # Target variance
a0 = NULL, # Initial shape (from A1 if NULL)
b0 = NULL, # Initial rate (from A1 if NULL)
tol_F = 1e-8, # Residual tolerance
tol_step = 1e-10, # Step size tolerance
max_iter = 20L,
damping = TRUE, # Backtracking line search
use_fallback = TRUE, # Nelder-Mead fallback
M = 80L,
verbose = FALSE
)Algorithm:
- Initialize from A1 closed-form
- Log-parameterize:
- Newton iteration with score-based Jacobian
- Backtracking line search for global convergence
Convergence: Typically achieves machine-precision accuracy in 3-8 iterations.
Examples:
# Exact moment matching
fit <- DPprior_a2_newton(J = 50, mu_K = 5, var_K = 8, verbose = TRUE)
#> A2-MN Newton Solver
#> Target: E[K]=5.0000, Var(K)=8.0000
#> A1 initialization: a0=4.000000, b0=3.912023
#> --------------------------------------------------------------------------------
#> Iter | a | b | E[K] | Var(K) | ||F|| | step | det(J)
#> --------------------------------------------------------------------------------
#> 1 | 4.000000 | 3.912023 | 4.461351 | 4.783136 | 3.26e+00 | 1.0000 | -5.30e+00
#> 2 | 1.178650 | 0.911969 | 4.909046 | 10.854537 | 2.86e+00 | 1.0000 | -2.16e+01
#> 3 | 1.844384 | 1.455254 | 4.974913 | 8.399473 | 4.00e-01 | 1.0000 | -1.53e+01
#> 4 | 2.029223 | 1.599680 | 4.999187 | 8.013243 | 1.33e-02 | 1.0000 | -1.43e+01
#> 5 | 2.036082 | 1.605046 | 4.999999 | 8.000021 | 2.08e-05 | 1.0000 | -1.43e+01
#> 6 | 2.036093 | 1.605054 | 5.000000 | 8.000000 | 7.60e-09 | --- | -1.43e+01
#>
#> Converged: ||F|| = 7.60e-09 < 1.00e-08
# Verify exact matching
achieved <- exact_K_moments(50, fit$a, fit$b)
cat("\nTarget vs Achieved:\n")
#>
#> Target vs Achieved:
cat(" E[K]: 5.000000 vs", sprintf("%.10f", achieved$mean), "\n")
#> E[K]: 5.000000 vs 4.9999999992
cat(" Var(K): 8.000000 vs", sprintf("%.10f", achieved$var), "\n")
#> Var(K): 8.000000 vs 8.00000000763.3 DPprior_a2_kl()
A2-KL Distribution Matching via KL Divergence
Minimizes Kullback-Leibler divergence between a target PMF and the induced marginal PMF of .
DPprior_a2_kl(
J, # Sample size
target, # Target PMF or list(mu_K, var_K)
method = c("pmf", "chisq"), # Target type
max_iter = 100L,
tol = 1e-6,
M = 80L,
verbose = FALSE
)Target Specification:
method |
target Input |
Description |
|---|---|---|
"pmf" |
Numeric vector of length | Direct PMF on |
"chisq" |
list(mu_K = ..., var_K = ...) |
Discretized chi-square matching moments |
Examples:
# Method 1: Moment-based target using chi-square discretization
fit_kl <- DPprior_a2_kl(J = 50, target = list(mu_K = 5, var_K = 8),
method = "chisq")
cat("A2-KL solution: Gamma(", round(fit_kl$a, 4), ", ",
round(fit_kl$b, 4), ")\n", sep = "")
#> A2-KL solution: Gamma(2.2363, 1.7627)
cat("Final KL divergence:", format(fit_kl$fit$kl, scientific = TRUE), "\n")
#> Final KL divergence: 5.029612e-03
# Method 2: Direct PMF target
target_pmf <- discretize_chisq(J = 50, df = 6.25, scale = 0.8)
fit_kl2 <- DPprior_a2_kl(J = 50, target = target_pmf, method = "pmf")4. Weight Distribution Functions
Functions for computing properties of the first stick-breaking weight and the co-clustering probability .
4.1 First Weight Functions
The size-biased first weight follows a compound distribution:
mean_w1()
Marginal Mean of
mean_w1(a, b, M = 80L)Computes via quadrature.
Examples:
mean_w1(a = 1.6, b = 1.22)
#> [1] 0.508368
quantile_w1()
Marginal Quantiles of
quantile_w1(p, a, b, n_grid = 1000, M = 80L)Computes quantiles via numerical inversion of the CDF.
Examples:
# Median of w1
median_w1 <- quantile_w1(0.5, a = 1.6, b = 1.22)
cat("Median(w1) =", round(median_w1, 4), "\n")
#> Median(w1) = 0.4839
prob_w1_exceeds()
Tail Probability
prob_w1_exceeds(x, a, b, M = 80L)Critical for Dominance Assessment: indicates the probability that a single cluster dominates.
Examples:
# Dominance probabilities
a <- 1.6; b <- 1.22
cat("P(w1 > 0.5) =", round(prob_w1_exceeds(0.5, a, b), 4), "\n")
#> P(w1 > 0.5) = 0.4868
cat("P(w1 > 0.7) =", round(prob_w1_exceeds(0.7, a, b), 4), "\n")
#> P(w1 > 0.7) = 0.3334
cat("P(w1 > 0.9) =", round(prob_w1_exceeds(0.9, a, b), 4), "\n")
#> P(w1 > 0.9) = 0.18334.2 Co-Clustering Probability Functions
The co-clustering probability represents the probability that two randomly chosen observations belong to the same cluster.
mean_rho() and var_rho()
Marginal Moments of
Key Identity: , so .
Examples:
a <- 1.6; b <- 1.22
cat("E[rho] =", round(mean_rho(a, b), 4), "\n")
#> E[rho] = 0.5084
cat("E[w1] =", round(mean_w1(a, b), 4), "(same as E[rho])\n")
#> E[w1] = 0.5084 (same as E[rho])
cat("Var(rho) =", round(var_rho(a, b), 4), "\n")
#> Var(rho) = 0.071
cat("Var(w1) =", round(var_w1(a, b), 4), "(different!)\n")
#> Var(w1) = 0.1052 (different!)4.3 Conditional Weight Functions
mean_rho_given_alpha() and
var_rho_given_alpha()
mean_rho_given_alpha(alpha) # Returns 1/(1 + alpha)
var_rho_given_alpha(alpha) # Returns 2*alpha / ((1+alpha)^2 * (2+alpha) * (3+alpha))5. Diagnostic Functions
5.1 DPprior_diagnostics()
Comprehensive Prior Diagnostics
Computes a full diagnostic report implementing the “unintended prior” checks from RN-07.
DPprior_diagnostics(fit, thresholds = c(0.5, 0.9))Parameters:
| Parameter | Type | Description |
|---|---|---|
fit |
DPprior_fit | Fitted object with a, b,
J. |
thresholds |
Numeric | Thresholds for checks. |
Returns: An S3 object of class
DPprior_diagnostics with:
| Component | Description |
|---|---|
alpha |
Summary of distribution (mean, CV, quantiles) |
K |
Summary of distribution (mean, SD, mode, median) |
weights |
distribution and dominance risk assessment |
coclustering |
summary with interpretation |
warnings |
Character vector of diagnostic warnings |
Warning Thresholds:
-
HIGH DOMINANCE RISK: -
NEAR-DEGENERATE RISK:
Examples:
fit <- DPprior_fit(J = 50, mu_K = 5, var_K = 8)
#> Warning: HIGH DOMINANCE RISK: P(w1 > 0.5) = 48.1% exceeds 40%.
#> This may indicate unintended prior behavior (RN-07).
#> Consider using DPprior_dual() for weight-constrained elicitation.
#> See ?DPprior_diagnostics for interpretation.
diag <- DPprior_diagnostics(fit)
print(diag)
#> DPprior Comprehensive Diagnostics
#> ============================================================
#>
#> Prior: alpha ~ Gamma(2.0361, 1.6051) for J = 50
#>
#> alpha Distribution:
#> ----------------------------------------
#> E[alpha] = 1.269, CV(alpha) = 0.701, Median = 1.068
#> 90% CI: [0.230, 2.992]
#>
#> K_J Distribution:
#> ----------------------------------------
#> E[K] = 5.00, SD(K) = 2.83, Mode = 3
#> Median = 5, IQR = [3, 7]
#>
#> w1 Distribution (Size-Biased First Weight):
#> ----------------------------------------
#> E[w1] = 0.501, Median = 0.478
#> P(w1 > 0.5) = 48.1% (dominance risk: HIGH)
#> P(w1 > 0.9) = 16.3%
#>
#> Co-Clustering (rho = sum w_h^2):
#> ----------------------------------------
#> E[rho] = 0.501 (High prior co-clustering: most unit pairs expected in same cluster)
#>
#> WARNINGS:
#> ----------------------------------------
#> * HIGH DOMINANCE RISK: P(w1 > 0.5) = 48.1% exceeds 40%
#> * NEAR-DEGENERATE RISK: P(w1 > 0.9) = 16.3% exceeds 15%
#>
#> Consider using DPprior_dual() for weight-constrained elicitation.5.2 S3 Methods for DPprior_fit
summary.DPprior_fit()
summary(fit, print_output = TRUE)Returns: An object of class
summary.DPprior_fit with detailed statistics.
5.3 Individual Plot Functions
plot_alpha_prior(fit) # Alpha distribution only
plot_K_prior(fit) # K distribution only
plot_w1_prior(fit) # w1 distribution with dominance5.4 Dominance Risk Functions
check_dominance_risk()
check_dominance_risk(a, b, threshold = 0.5, risk_level = 0.3)Returns TRUE if
.
compute_K_diagnostics()
compute_K_diagnostics(J, a, b, M = 80L)Returns mean, SD, mode, median, and quantiles of .
compute_weight_diagnostics()
compute_weight_diagnostics(a, b, thresholds = c(0.5, 0.9), M = 80L)Returns summary with dominance risk classification.
6. Utility Functions
6.1 Variance Conversion Functions
vif_to_variance()
Convert Variance Inflation Factor to Variance
vif_to_variance(mu_K, vif)Computes , based on the marginal overdispersion relationship.
Examples:
# VIF = 2 means variance is twice the Poisson-like baseline
var_K <- vif_to_variance(mu_K = 5, vif = 2)
cat("var_K =", var_K, "\n")
#> var_K = 8
confidence_to_vif()
Map Confidence Level to VIF
confidence_to_vif(confidence)| Confidence | VIF | Interpretation |
|---|---|---|
"low" |
4.0 | High uncertainty |
"medium" |
2.5 | Moderate uncertainty |
"high" |
1.5 | Low uncertainty |
6.2 Scaling Functions
compute_scaling_constant()
compute_scaling_constant(J, scaling = "log", mu_K = NULL)Computes the scaling constant used in the A1 approximation:
scaling |
Formula |
|---|---|
"log" |
|
"harmonic" |
|
"digamma" |
6.3 Quadrature Functions
gauss_laguerre_nodes()
Compute Gauss-Laguerre Quadrature Nodes and Weights
gauss_laguerre_nodes(M, alpha_param = 0)Parameters:
| Parameter | Type | Description |
|---|---|---|
M |
Integer | Number of quadrature nodes. |
alpha_param |
Numeric | Generalized Laguerre parameter (for Gamma(a,b), use ). |
Returns: List with nodes,
weights, and weights_log.
build_gamma_quadrature()
Build Quadrature for Gamma Distribution
build_gamma_quadrature(a, b, M = 80L)Transforms standard Laguerre quadrature to integrate against Gamma(a,b).
Returns: List with alpha_nodes and
weights_normalized.
integrate_gamma()
Compute Expectation under Gamma Distribution
integrate_gamma(f, a, b, M = 80L)Computes using Gauss-Laguerre quadrature.
Examples:
# E[alpha] = a/b
a <- 2; b <- 0.5
E_alpha <- integrate_gamma(identity, a, b)
cat("E[alpha] via quadrature:", round(E_alpha, 6), "\n")
#> E[alpha] via quadrature: 4
cat("E[alpha] exact (a/b):", a/b, "\n")
#> E[alpha] exact (a/b): 46.6 KL Divergence Functions
discretize_chisq()
Create Target PMF from Chi-Square
discretize_chisq(J, df, scale = 1)Discretizes a (scaled) chi-square distribution onto .
7. Verification Functions
The package includes extensive verification functions for development and testing. These are typically not needed by end users but are available for those who wish to validate computations.
7.1 Module Verification Functions
| Function | Description |
|---|---|
validate_stirling(logS) |
Verify Stirling numbers against known values |
verify_stirling_row_sum(logS) |
Verify |
verify_underdispersion(J, alpha_values) |
Verify |
verify_pmf_moments(J, alpha, logS) |
Verify PMF matches closed-form moments |
verify_marginal_moments(J, a, b) |
Verify marginal moment properties |
verify_a1_roundtrip(fit) |
Verify A1 parameters recover targets |
verify_a2_moment_matching(J, mu_K, var_K) |
Verify A2 achieves exact matching |
Examples:
# Verify Stirling numbers
logS <- compute_log_stirling(20)
validate_stirling(logS, verbose = TRUE)
#> PASS: |s(4,2)| = 11 (expected 11)
#> PASS: |s(5,3)| = 35 (expected 35)
#> PASS: |s(6,3)| = 225 (expected 225)
#> PASS: |s(10,5)| = 269325 (expected 269325)
#> [1] TRUE
# Verify underdispersion
verify_underdispersion(50, alpha_values = c(0.5, 1, 2, 5))
#> Underdispersion verification (J=50):
#> alpha= 0.50: E[K]= 2.9378, Var(K)= 1.7091, D=0.5818 [PASS]
#> alpha= 1.00: E[K]= 4.4992, Var(K)= 2.8741, D=0.6388 [PASS]
#> alpha= 2.00: E[K]= 7.0376, Var(K)= 4.5356, D=0.6445 [PASS]
#> alpha= 5.00: E[K]= 12.4605, Var(K)= 7.3861, D=0.5928 [PASS]8. Function Quick Reference
By Task
“I want to elicit a prior based on expected clusters”
fit <- DPprior_fit(J = 50, mu_K = 5, confidence = "medium")“I want to control both clusters AND weight concentration”
fit_K <- DPprior_fit(J = 50, mu_K = 5, var_K = 8)
fit <- DPprior_dual(fit_K, list(prob = list(threshold = 0.5, value = 0.3)))“I want to check for dominance risk”
prob_w1_exceeds(0.5, fit$a, fit$b)
DPprior_diagnostics(fit)“I want to compute the exact PMF of K”
logS <- compute_log_stirling(J)
pmf <- pmf_K_marginal(J, fit$a, fit$b, logS)“I want to visualize my prior”
plot(fit) # Full dashboard
plot_K_prior(fit) # K distribution onlyAlphabetical Index
| Function | Category | Description |
|---|---|---|
build_gamma_quadrature() |
Utility | Quadrature for Gamma |
cdf_K_given_alpha() |
Exact | Conditional CDF |
cdf_w1() |
Weights | CDF of |
check_dominance_risk() |
Diagnostics | Risk assessment |
compute_log_stirling() |
Exact | Stirling numbers |
compute_scaling_constant() |
Utility | Scaling |
confidence_to_vif() |
Utility | Confidence to VIF |
cv_alpha_to_variance() |
Utility | CV to variance |
cv_rho() |
Weights | CV of |
discretize_chisq() |
Utility | Chi-square to PMF |
DPprior_a1() |
Approximation | A1 closed-form |
DPprior_a2_kl() |
Approximation | A2-KL solver |
DPprior_a2_newton() |
Approximation | A2-MN Newton |
DPprior_diagnostics() |
Diagnostics | Full diagnostics |
DPprior_dual() |
Core | Dual-anchor elicitation |
DPprior_fit() |
Core | Main entry point |
exact_K_moments() |
Exact | Marginal moments |
gauss_laguerre_nodes() |
Utility | Quadrature nodes |
get_log_stirling() |
Exact | Safe Stirling access |
get_stirling_row() |
Exact | Row of Stirling |
integrate_gamma() |
Utility | Expectation under Gamma |
kl_divergence_K() |
Utility | KL for K |
kl_divergence_pmf() |
Utility | KL between PMFs |
log_rising_factorial() |
Utility | Log Pochhammer |
logsumexp() |
Utility | Binary log-sum-exp |
logsumexp_vec() |
Utility | Vector log-sum-exp |
mean_K_given_alpha() |
Exact | Conditional mean |
mean_rho() |
Weights | Marginal |
mean_rho_given_alpha() |
Weights | Conditional |
mean_w1() |
Weights | Marginal |
moments_K_given_alpha() |
Exact | Both conditional moments |
pdf_w1() |
Weights | PDF of |
plot.DPprior_fit() |
Diagnostics | Dashboard plot |
plot_alpha_prior() |
Diagnostics | Alpha plot |
plot_K_prior() |
Diagnostics | K plot |
plot_w1_prior() |
Diagnostics | plot |
pmf_K_given_alpha() |
Exact | Conditional PMF |
pmf_K_marginal() |
Exact | Marginal PMF |
print.DPprior_fit() |
Diagnostics | Print method |
prob_w1_exceeds() |
Weights | Tail probability |
quantile_K_given_alpha() |
Exact | Conditional quantile |
quantile_w1() |
Weights | Quantiles of |
softmax() |
Utility | Stable softmax |
summary.DPprior_fit() |
Diagnostics | Summary method |
summary_K_marginal() |
Exact | Full K summary |
var_K_given_alpha() |
Exact | Conditional variance |
var_rho() |
Weights | Marginal |
var_rho_given_alpha() |
Weights | Conditional |
var_w1() |
Weights | Marginal |
vif_to_variance() |
Utility | VIF to variance |
References
Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6), 1152-1174.
Golub, G. H., & Welsch, J. H. (1969). Calculation of Gauss quadrature rules. Mathematics of Computation, 23(106), 221-230.
Lee, J. (2025). RN-01 through RN-07: Research Notes for DPprior Package Development.
Lee, J., Che, J., Rabe-Hesketh, S., Feller, A., & Miratrix, L. (2025). Improving the estimation of site-specific effects and their distribution in multisite trials. Journal of Educational and Behavioral Statistics, 50(5), 731-764.
For additional documentation, see the package vignettes or visit the GitHub repository.