Quick Start: Your First Variance Decomposition in 5 Minutes
JoonHo Lee
2026-01-24
Source:vignettes/quick-start.Rmd
quick-start.RmdOverview
This vignette demonstrates the fastest path to performing variance decomposition with the bhfvar package. By the end of this 5-minute tutorial, you will be able to:
- Prepare complex survey data for analysis
- Fit the Bayesian Hybrid Framework model
- Extract and interpret variance decomposition results
- Obtain domain-specific estimates with appropriate shrinkage
1. Minimal Example
Let’s work through a complete analysis using the package’s synthetic dataset. This dataset mimics the structure of the National Survey of Early Care and Education (NSECE), with 50 states, 27 strata, and 356 PSUs.
library(bhfvar)
library(rstan)
# Enable parallel processing
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
# Step 1: Compile the Stan model (once per session)
model <- compile_bhf_model()
# Step 2: Load and prepare the data
data(bhf_synthetic_data)
prepared <- prepare_bhf_data(
data = bhf_synthetic_data,
outcome = "has_subsidy",
domain = "state",
strata = "stratum",
psu = "psu",
weights = "weight"
)
# Step 3: Fit the model
fit <- bhf_fit(prepared, model = model, chains = 4, iter = 2000)
# Step 4: Get results
variance_decomposition(fit)That’s it! You now have a complete Bayesian variance decomposition.
2. Understanding the Data
Let’s examine the synthetic dataset:
data(bhf_synthetic_data)
# Structure
str(bhf_synthetic_data)
# First few rows
head(bhf_synthetic_data)
# Summary statistics
cat("Total observations:", nrow(bhf_synthetic_data), "\n")
cat("Number of states:", length(unique(bhf_synthetic_data$state)), "\n")
cat("Number of strata:", length(unique(bhf_synthetic_data$stratum)), "\n")
cat("Number of PSUs:", length(unique(bhf_synthetic_data$psu)), "\n")
cat("Overall proportion with subsidy:",
round(mean(bhf_synthetic_data$has_subsidy), 3), "\n")The dataset contains:
| Variable | Description |
|---|---|
state |
Domain identifier (State_01 to State_50) |
stratum |
Design stratum (Stratum_01 to Stratum_27) |
psu |
Primary Sampling Unit within stratum |
weight |
Survey weight |
has_subsidy |
Binary outcome (1 = receives subsidy, 0 = does not) |
3. Preparing Data for Stan
The prepare_bhf_data() function transforms your data
into the format required by the Stan model:
prepared <- prepare_bhf_data(
data = bhf_synthetic_data,
outcome = "has_subsidy",
domain = "state",
strata = "stratum",
psu = "psu",
weights = "weight",
use_deattenuation = TRUE # Enable de-attenuation (default)
)
# Print summary
print(prepared)The output shows:
- Sample size and number of domains/strata/PSUs
- Outcome proportion
- Whether de-attenuation is enabled
4. Fitting the Model
The bhf_fit() function fits the Bayesian Hybrid
Framework model via Stan:
fit <- bhf_fit(
data = prepared,
model = model,
chains = 4, # Number of MCMC chains
iter = 2000, # Total iterations per chain
warmup = 1000, # Warmup iterations (discarded)
seed = 42 # For reproducibility
)
# Print fit summary
print(fit)The printed output includes:
- Data dimensions
- Sampling settings
- Convergence diagnostics (divergences, Rhat, ESS)
- Key parameter estimates
5. Extracting Results
5.1 Variance Decomposition
The main result is the variance decomposition across three estimands:
vd <- variance_decomposition(fit)
# The function prints a formatted summary and returns detailed results
# Access specific components:
cat("ICC on logit scale (Estimand A):",
round(vd$logit$icc_mean, 4), "\n")
cat("ICC on probability scale (Estimand B):",
round(vd$prob$icc_mean, 4), "\n")
cat("ICC de-attenuated (Estimand A*):",
round(vd$deatten$icc_mean, 4), "\n")5.2 Domain Estimates
Get estimates for each domain (state):
# Marginal estimates (integrating out within-domain variation)
estimates <- domain_estimates(fit, type = "marginal")
# View the results
head(estimates)
# Key columns:
# - domain: Domain name
# - mean: Posterior mean probability
# - q025, q975: 95% credible interval
# - reliability: Shrinkage factor (higher = less shrinkage)
# - pop_share: Population share of this domain5.3 Overall Estimate
Get the population-weighted overall proportion:
overall <- overall_estimate(fit)
cat("Overall proportion:", round(overall$mean, 3),
"(95% CI:", round(overall$q025, 3), "-", round(overall$q975, 3), ")\n")6. Quick Diagnostic Check
Always verify convergence before interpreting results:
# Check the fit object's diagnostics
cat("Divergent transitions:", fit$diagnostics$n_divergent, "\n")
cat("Max Rhat:", round(fit$diagnostics$rhat_max, 3), "\n")
cat("Min ESS:", fit$diagnostics$ess_min, "\n")
# Rules of thumb:
# - Divergent transitions should be 0
# - Rhat should be < 1.01 (ideally < 1.001)
# - ESS should be > 400 for reliable inference7. Complete Workflow Summary
Here’s the complete workflow in a single code block:
# ============================================================
# bhfvar Quick Start: Complete Workflow
# ============================================================
library(bhfvar)
library(rstan)
# Setup
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
# 1. Compile model
model <- compile_bhf_model()
# 2. Load data
data(bhf_synthetic_data)
# 3. Prepare data
prepared <- prepare_bhf_data(
data = bhf_synthetic_data,
outcome = "has_subsidy",
domain = "state",
strata = "stratum",
psu = "psu",
weights = "weight"
)
# 4. Fit model
fit <- bhf_fit(prepared, model = model, chains = 4, iter = 2000)
# 5. Results
variance_decomposition(fit) # Variance decomposition
estimates <- domain_estimates(fit) # Domain-specific estimates
overall <- overall_estimate(fit) # Overall estimate
# 6. Check diagnostics
print(fit)What’s Next?
This quick start covered the essentials. For more detailed guidance:
- Complete Workflow: Full analysis with interpretation and visualization
- Methodology: Understanding the Bayesian Hybrid Framework
- Dual Estimands: Deep dive into estimands A, B, and A*
- Diagnostics: Comprehensive convergence checking
Summary Table
| Step | Function | Purpose |
|---|---|---|
| 1 | compile_bhf_model() |
Compile Stan model (once per session) |
| 2 | prepare_bhf_data() |
Transform data for Stan |
| 3 | bhf_fit() |
Fit the Bayesian model |
| 4 | variance_decomposition() |
Get ICC across estimands |
| 5 | domain_estimates() |
Get domain-specific results |
| 6 | overall_estimate() |
Get population-weighted overall |
For questions or feedback, please visit the GitHub repository.