Skip to contents

Computes an upper bound on the conditional total variation distance between the shifted cluster count \(S_J = K_J - 1\) and a Poisson law with the same mean (Poissonization error).

Usage

compute_poissonization_bound(J, alpha, raw = FALSE)

Arguments

J

Integer; sample size (number of observations).

alpha

Numeric; concentration parameter (can be vectorized).

raw

Logical; if TRUE, return just sum(p_i^2) without the prefactor. Default is FALSE.

Value

Numeric vector; upper bound on \(d_{TV}(S_J | \alpha, \text{Poisson}(\lambda_J(\alpha)))\).

Details

Under the CRP representation, \(S_J = \sum_{i=2}^J I_i\) where \(I_i \sim \text{Bernoulli}(p_i)\) and \(p_i = \alpha / (\alpha + i - 1)\).

A standard Chen-Stein/Le Cam bound gives: $$d_{TV}(S_J, \text{Poisson}(\lambda)) \le \frac{1 - e^{-\lambda}}{\lambda} \sum_{i=2}^J p_i^2$$ where \(\lambda = \sum_{i=2}^J p_i = E[S_J | \alpha]\).

The prefactor \((1 - e^{-\lambda})/\lambda\) is always in (0, 1] and approaches 1 as \(\lambda \to 0\). This provides a tighter bound than simply using \(\sum p_i^2\) alone.

The returned value is capped at 1 (since total variation is always between 0 and 1).

References

Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics, 10(4), 1181-1197.

Chen, L. H. Y. (1975). Poisson approximation for dependent trials. The Annals of Probability, 3(3), 534-545.

RN-05 (Theorem 1) and references therein.

Examples

# Full Chen-Stein bound
compute_poissonization_bound(J = 50, alpha = 1)
#> [1] 0.1732509

# Raw bound (sum of p_i^2)
compute_poissonization_bound(J = 50, alpha = 1, raw = TRUE)
#> [1] 0.6251327

# Vectorized
compute_poissonization_bound(J = 50, alpha = c(0.5, 1, 2, 5))
#> [1] 0.1010243 0.1732509 0.2481908 0.3555110