Skip to contents

Computes \(P(K_J = k \mid a, b)\) for \(k = 0, 1, \ldots, J\) when \(\alpha \sim \mathrm{Gamma}(a, b)\) (shape-rate parameterization).

Usage

pmf_K_marginal(J, a, b, logS, M = .QUAD_NODES_DEFAULT)

Arguments

J

Integer; sample size (positive integer >= 1).

a

Numeric; shape parameter of Gamma prior (> 0).

b

Numeric; rate parameter of Gamma prior (> 0).

logS

Matrix; pre-computed log-Stirling matrix from compute_log_stirling.

M

Integer; number of quadrature nodes (default: 80).

Value

Numeric vector of length \(J+1\) containing \(P(K_J = k \mid a, b)\) for \(k = 0, 1, \ldots, J\). Entry [1] corresponds to \(k=0\) and always equals 0. The vector sums to 1.

Details

Uses Gauss-Laguerre quadrature to numerically evaluate: $$P(K_J = k \mid a, b) = \int_0^\infty P(K_J = k \mid \alpha) \cdot g_{a,b}(\alpha) d\alpha$$ $$\approx \sum_{m=1}^M \tilde{w}_m \cdot P(K_J = k \mid \alpha_m)$$

where \(P(K_J = k \mid \alpha)\) is the Antoniak distribution from Module 04 and \((\alpha_m, \tilde{w}_m)\) are the transformed quadrature nodes and normalized weights from Module 02.

Implementation: All mixing is performed in log-space for numerical stability. This is critical for large J or extreme parameter values.

Key properties:

  • \(P(K_J = 0) = 0\) always (at least one cluster exists)

  • The PMF sums to 1

  • Moments from the PMF match exact_K_moments() within numerical tolerance

  • Mode is typically near \(E[K_J]\) but may differ

References

Antoniak, C. E. (1974). Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. The Annals of Statistics, 2(6), 1152-1174.

See also

log_pmf_K_marginal for log-scale computation, pmf_K_given_alpha for conditional PMF, exact_K_moments for marginal moments, cdf_K_marginal, quantile_K_marginal, mode_K_marginal, summary_K_marginal

Examples

# Pre-compute Stirling numbers
logS <- compute_log_stirling(50)

# Compute marginal PMF for J=50, Gamma(1.5, 0.5) prior
pmf <- pmf_K_marginal(50, 1.5, 0.5, logS)

# Verify normalization
sum(pmf)
#> [1] 1

# Most likely number of clusters
which.max(pmf) - 1
#> [1] 6

# Compare mean with exact_K_moments
k_vals <- 0:50
mean_pmf <- sum(k_vals * pmf)
exact <- exact_K_moments(50, 1.5, 0.5)
abs(mean_pmf - exact$mean)
#> [1] 7.105427e-15