Algorithm 1: Empirical Quadrature Calibration (EQC)
Source:vignettes/eqc_algorithm.Rmd
eqc_algorithm.RmdOverview
Empirical Quadrature Calibration (EQC) is the primary algorithm in IRTsimrel for reliability-targeted simulation. Given a target marginal reliability , EQC finds a global discrimination scaling factor such that the population reliability equals the target.
Why EQC is Recommended
| Feature | EQC Advantage |
|---|---|
| Speed | Deterministic root-finding with convergence |
| Accuracy | Typically achieves target within ±0.01 |
| Simplicity | No step-size tuning or learning rates |
| Reliability | Guaranteed convergence via Brent’s method |
For most simulation studies, EQC is all you need. Use SPC (Algorithm 2) only for validation or specialized applications.
Mathematical Foundation
The Inverse Problem
Forward problem: Given item parameters and a scaling factor , compute reliability .
Inverse problem: Given target reliability , find such that .
Discrimination Scaling
EQC solves the inverse problem by scaling all item discriminations by a common factor:
where are baseline discriminations from the item generator.
Why this works:
- Separates informativeness from structure: Changing modifies reliability without altering item difficulty distributions or latent trait shape
- Monotonic relationship: Reliability is strictly increasing in
- Reduces dimensionality: Multi-dimensional design space becomes a 1D root-finding problem
Algorithm Steps
EQC operates in three steps:
Step 1: Generate Quadrature Samples
Draw large fixed samples from the latent and item parameter distributions:
- (latent abilities)
- (item parameters)
These samples serve as an empirical quadrature rule for approximating population expectations.
Step 2: Define the Reliability Function
For any scale , compute the empirical reliability:
This approximates the population reliability by the Law of Large Numbers.
Step 3: Root-Finding
Solve the scalar equation:
using Brent’s method (uniroot() in R), which combines
bisection, secant, and inverse quadratic interpolation for robust
convergence.
Basic Usage
# Basic EQC calibration
eqc_result <- eqc_calibrate(
target_rho = 0.80,
n_items = 25,
model = "rasch",
latent_shape = "normal",
item_source = "irw",
seed = 42
)
#> Note: Target rho* = 0.800 is near the achievable maximum (0.824) for this configuration.
print(eqc_result)
#>
#> =======================================================
#> Empirical Quadrature Calibration (EQC) Results
#> =======================================================
#>
#> Calibration Summary:
#> Model : RASCH
#> Target reliability (rho*) : 0.8000
#> Achieved reliability : 0.8000
#> Absolute error : 2.27e-06
#> Scaling factor (c*) : 0.9082
#>
#> Design Parameters:
#> Number of items (I) : 25
#> Quadrature points (M) : 10000
#> Reliability metric : MSEM-based (bar/w)
#> Latent variance : 1.0123
#>
#> Convergence:
#> Root status : uniroot_success
#> Search bracket : [0.300, 3.000]
#> Bracket reliabilities : [0.3555, 0.8240]
#>
#> Parameter Summaries:
#> theta: mean = -0.011, sd = 1.006
#> beta: mean = 0.000, sd = 0.670, range = [-1.48, 1.06]
#> lambda_base: mean = 1.000, sd = 0.000
#> lambda_scaled: mean = 0.908, sd = 0.000Understanding the Output
The eqc_result object contains:
| Component | Description |
|---|---|
c_star |
Calibrated scaling factor |
target_rho |
Target reliability |
achieved_rho |
Achieved reliability |
metric |
Reliability metric used (msem or
info) |
theta_quad |
Quadrature sample of abilities |
beta_vec |
Item difficulties |
lambda_base |
Baseline discriminations |
lambda_scaled |
Calibrated discriminations |
items_base |
Item parameters before calibration |
items_calib |
Item parameters after calibration |
Accessing Calibrated Parameters
# The calibrated scaling factor
cat(sprintf("c* = %.4f\n", eqc_result$c_star))
#> c* = 0.9082
# Item difficulties
head(eqc_result$beta_vec)
#> [1] 0.1312995 -1.2945477 0.3836801 -0.6358214 0.5071690 0.4314725
# Calibrated discriminations
head(eqc_result$lambda_scaled)
#> [1] 0.9082186 0.9082186 0.9082186 0.9082186 0.9082186 0.9082186
# Full calibrated item parameters
head(eqc_result$items_calib$data)
#> form_id item_id beta lambda lambda_unscaled
#> 1 1 1 0.1312995 0.9082186 1
#> 2 1 2 -1.2945477 0.9082186 1
#> 3 1 3 0.3836801 0.9082186 1
#> 4 1 4 -0.6358214 0.9082186 1
#> 5 1 5 0.5071690 0.9082186 1
#> 6 1 6 0.4314725 0.9082186 1Convergence Information
# Root-finding status
cat(sprintf("Root status: %s\n", eqc_result$misc$root_status))
#> Root status: uniroot_success
# Achievable reliability range
cat(sprintf("Reliability at c_lower (%.2f): %.4f\n",
eqc_result$misc$c_bounds[1],
eqc_result$misc$rho_bounds["rho_L"]))
#> Reliability at c_lower (0.30): 0.3555
cat(sprintf("Reliability at c_upper (%.2f): %.4f\n",
eqc_result$misc$c_bounds[2],
eqc_result$misc$rho_bounds["rho_U"]))
#> Reliability at c_upper (3.00): 0.8240Key Parameters
Target Reliability (target_rho)
The desired marginal reliability, typically between 0.50 and 0.90:
# Low reliability (formative assessment)
eqc_low <- eqc_calibrate(target_rho = 0.60, n_items = 20, seed = 1)
# Moderate reliability (typical research)
eqc_mod <- eqc_calibrate(target_rho = 0.75, n_items = 20, seed = 1)
# High reliability (standardized test)
eqc_high <- eqc_calibrate(target_rho = 0.85, n_items = 20, seed = 1)Quadrature Size (M)
The number of latent ability samples for empirical integration. Larger reduces Monte Carlo error:
# Default: M = 10,000
eqc_default <- eqc_calibrate(
target_rho = 0.80, n_items = 25, M = 10000, seed = 42
)
#> Note: Target rho* = 0.800 is near the achievable maximum (0.824) for this configuration.
# Higher precision: M = 50,000
eqc_precise <- eqc_calibrate(
target_rho = 0.80, n_items = 25, M = 50000, seed = 42
)
cat(sprintf("M = 10,000: c* = %.6f\n", eqc_default$c_star))
#> M = 10,000: c* = 0.908219
cat(sprintf("M = 50,000: c* = %.6f\n", eqc_precise$c_star))
#> M = 50,000: c* = 1.156505Recommendation: for routine use; for high-precision work.
Search Bounds (c_bounds)
The interval for root-finding:
# Default bounds
eqc_default <- eqc_calibrate(
target_rho = 0.80, n_items = 25,
c_bounds = c(0.3, 3), # Default
seed = 42
)
#> Note: Target rho* = 0.800 is near the achievable maximum (0.824) for this configuration.
# Extended bounds for high reliability
eqc_extended <- eqc_calibrate(
target_rho = 0.85, n_items = 25,
c_bounds = c(0.1, 5),
seed = 42
)
#> Warning in eqc_calibrate(target_rho = 0.85, n_items = 25, c_bounds = c(0.1, : Target rho* = 0.850 exceeds the maximum achievable reliability for this configuration.
#> - Items: 25
#> - At c = 5.00 (upper bound): rho = 0.0440
#> - Gap: 0.8060
#>
#> Suggestions:
#> 1. Use reliability_metric = 'info' (typically yields higher values)
#> 2. Increase c_bounds[2] beyond 5.0
#> 3. Increase n_items for higher achievable reliability
#> 4. Accept achieved rho = 0.0440 as maximum for this design
#>
#> Returning c_star = 5.00 (upper bound).When to extend bounds:
- Target reliability near achievable maximum → increase
c_bounds[2] - Very low target reliability → decrease
c_bounds[1]
Reliability Metrics
EQC supports two reliability definitions:
MSEM-Based Reliability ()
The theoretically exact marginal reliability based on the harmonic mean of test information:
eqc_msem <- eqc_calibrate(
target_rho = 0.80,
n_items = 25,
reliability_metric = "msem", # Default, or use "bar"
seed = 42
)
#> Note: Target rho* = 0.800 is near the achievable maximum (0.824) for this configuration.Average-Information Reliability ()
Based on the arithmetic mean of test information:
eqc_info <- eqc_calibrate(
target_rho = 0.80,
n_items = 25,
reliability_metric = "info", # Or use "tilde"
seed = 42
)Key Relationship
By Jensen’s inequality, always holds:
cat(sprintf("MSEM metric c*: %.4f\n", eqc_msem$c_star))
#> MSEM metric c*: 0.9082
cat(sprintf("Info metric c*: %.4f\n", eqc_info$c_star))
#> Info metric c*: 0.8830The info metric typically requires a
smaller
to achieve the same target, because it yields higher reliability values
for a given
.
Recommendation:
- Use
"msem"(default) for theoretical correctness - Use
"info"when targeting very high reliability that"msem"cannot achieve
Working with Different Models
Rasch Model
eqc_rasch <- eqc_calibrate(
target_rho = 0.75,
n_items = 30,
model = "rasch",
seed = 42
)
# All baseline discriminations are 1
unique(eqc_rasch$lambda_base)
#> [1] 12PL Model
eqc_2pl <- eqc_calibrate(
target_rho = 0.75,
n_items = 30,
model = "2pl",
item_source = "irw",
item_params = list(
discrimination_params = list(
mu_log = 0,
sigma_log = 0.3,
rho = -0.3
)
),
seed = 42
)
# Discriminations vary across items
summary(eqc_2pl$lambda_base)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.6887 0.7896 0.9104 1.0097 1.2525 1.6613Working with Different Latent Distributions
# Normal distribution
eqc_normal <- eqc_calibrate(
target_rho = 0.80, n_items = 25,
latent_shape = "normal",
seed = 42
)
# Bimodal distribution
eqc_bimodal <- eqc_calibrate(
target_rho = 0.80, n_items = 25,
latent_shape = "bimodal",
latent_params = list(delta = 0.8),
seed = 42
)
# Heavy-tailed distribution
eqc_heavy <- eqc_calibrate(
target_rho = 0.80, n_items = 25,
latent_shape = "heavy_tail",
latent_params = list(df = 5),
seed = 42
)Different latent shapes require different values to achieve the same target reliability:
Generating Response Data
After calibration, generate item response data:
# Generate response data
sim_data <- simulate_response_data(
eqc_result = eqc_result,
n_persons = 1000,
latent_shape = "normal",
seed = 123
)
# Response matrix dimensions
dim(sim_data$response_matrix)
#> [1] 1000 25
# First few responses
head(sim_data$response_matrix[, 1:6])
#> item1 item2 item3 item4 item5 item6
#> [1,] 0 1 0 1 0 1
#> [2,] 0 0 1 0 0 0
#> [3,] 1 1 1 1 1 1
#> [4,] 1 1 1 1 0 1
#> [5,] 0 1 0 1 0 0
#> [6,] 1 1 1 1 1 0
# True abilities
head(sim_data$theta)
#> [1] -0.56047565 -0.23017749 1.55870831 0.07050839 0.12928774 1.71506499Validation with TAM
Validate achieved reliability using the TAM package:
# Compute WLE and EAP reliability
tam_rel <- compute_reliability_tam(
resp = sim_data$response_matrix,
model = "rasch"
)
cat(sprintf("Target reliability: %.4f\n", eqc_result$target_rho))
cat(sprintf("EQC achieved rho: %.4f\n", eqc_result$achieved_rho))
cat(sprintf("TAM WLE reliability: %.4f\n", tam_rel$rel_wle))
cat(sprintf("TAM EAP reliability: %.4f\n", tam_rel$rel_eap))Understanding WLE vs EAP Reliability
An important validation finding: TAM’s EAP reliability is systematically higher than WLE reliability. This is not a bug but reflects different mathematical definitions:
| Metric | Definition | Typical Pattern |
|---|---|---|
| WLE | Design-effect based: | Conservative (lower bound) |
| EAP | Posterior variance based: | Liberal (upper bound) |
Practical interpretation:
- WLE reliability ≈ lower bound for true measurement precision
- EAP reliability ≈ upper bound for true measurement precision
- EQC’s theoretical reliability falls between these bounds
Handling Boundary Cases
Target Too High
When the target reliability exceeds what’s achievable with the given configuration:
# Try to achieve very high reliability with few items
eqc_high <- eqc_calibrate(
target_rho = 0.95,
n_items = 15,
model = "rasch",
c_bounds = c(0.3, 3),
seed = 42
)
#> Warning in eqc_calibrate(target_rho = 0.95, n_items = 15, model = "rasch", : Target rho* = 0.950 exceeds the maximum achievable reliability for this configuration.
#> - Items: 15
#> - At c = 3.00 (upper bound): rho = 0.7054
#> - Gap: 0.2446
#>
#> Suggestions:
#> 1. Use reliability_metric = 'info' (typically yields higher values)
#> 2. Increase c_bounds[2] beyond 3.0
#> 3. Increase n_items for higher achievable reliability
#> 4. Accept achieved rho = 0.7054 as maximum for this design
#>
#> Returning c_star = 3.00 (upper bound).The function returns c_star = c_bounds[2] (upper bound)
with a warning.
Solutions:
- Use
reliability_metric = "info"(yields higher values) - Increase
c_bounds[2] - Increase
n_items - Accept the achievable maximum
Target Too Low
When the target is below the minimum achievable:
# Very low target with high discrimination items
eqc_low <- eqc_calibrate(
target_rho = 0.30,
n_items = 50,
model = "2pl",
c_bounds = c(0.3, 3),
seed = 42
)The function returns c_star = c_bounds[1] (lower bound)
with a warning.
Solution: Decrease c_bounds[1].
Verbose Mode
Enable verbose output to see algorithm progress:
eqc_verbose <- eqc_calibrate(
target_rho = 0.80,
n_items = 25,
model = "rasch",
seed = 42,
verbose = TRUE
)
#> Step 1: Generating quadrature samples...
#> M (quad persons) = 10000
#> I (items) = 25
#> theta: mean = -0.011, sd = 1.006, var = 1.012
#> beta: mean = 0.000, sd = 0.670
#> lambda_base: mean = 1.000, sd = 0.000
#> metric = msem
#> Step 2: Running root-finding algorithm...
#> At c = 0.300: rho = 0.3555, g = -0.4445
#> At c = 3.000: rho = 0.8240, g = 0.0240
#> Note: Target rho* = 0.800 is near the achievable maximum (0.824) for this configuration.
#> c* = 0.908219
#> Target rho = 0.8000
#> Achieved rho = 0.8000
#> Root status = uniroot_successValidation with SPC
For rigorous validation, compare EQC results with the SPC algorithm:
# Run EQC
eqc_result <- eqc_calibrate(
target_rho = 0.80,
n_items = 25,
model = "rasch",
seed = 42
)
# Validate with SPC (warm start from EQC)
spc_result <- spc_calibrate(
target_rho = 0.80,
n_items = 25,
model = "rasch",
c_init = eqc_result, # Use EQC for warm start
n_iter = 200,
seed = 42
)
# Compare
compare_eqc_spc(eqc_result, spc_result)When EQC and SPC agree (typically within 5%), you can be confident in your calibration.
Complete Workflow Example
# ==== 1. Define simulation parameters ====
target_rho <- 0.80
n_items <- 25
n_persons <- 1000
n_replications <- 100
# ==== 2. Calibrate once using EQC ====
eqc_result <- eqc_calibrate(
target_rho = target_rho,
n_items = n_items,
model = "rasch",
latent_shape = "normal",
item_source = "irw",
M = 20000,
seed = 42,
verbose = TRUE
)
cat(sprintf("\nCalibrated c* = %.4f for target rho* = %.2f\n\n",
eqc_result$c_star, target_rho))
# ==== 3. Run simulation replications ====
results <- vector("list", n_replications)
for (r in 1:n_replications) {
# Generate data
sim_data <- simulate_response_data(
eqc_result = eqc_result,
n_persons = n_persons,
latent_shape = "normal",
seed = r
)
# Your analysis here...
# results[[r]] <- your_analysis(sim_data)
if (r %% 10 == 0) cat(sprintf("Completed replication %d\n", r))
}
# ==== 4. Summarize results ====
# summary_stats <- summarize_results(results)Troubleshooting Guide
| Problem | Possible Cause | Solution |
|---|---|---|
| Warning: upper bound hit | Target too high | Use metric = "info", increase c_bounds[2],
or increase n_items
|
| Warning: lower bound hit | Target too low | Decrease c_bounds[1]
|
| High Monte Carlo error | Small M
|
Increase M to 20,000+ |
| WLE reliability too low | Normal TAM behavior | EAP provides upper bound; WLE is conservative |
| Different results with same seed | Item parameter resampling | Fix item parameters or use same seed consistently |
Theoretical Properties
Convergence Rate
EQC uses Brent’s method, which achieves superlinear convergence:
In practice, convergence to machine precision typically requires < 50 function evaluations.
Summary
EQC is a fast, accurate, and reliable algorithm for reliability-targeted IRT simulation:
- When to use: Routine simulation studies, initial calibration
-
Key parameters:
target_rho,n_items,M,reliability_metric - Validation: Compare with TAM reliability and/or SPC algorithm
- Typical accuracy: Within ±0.01 of target reliability
For specialized applications or independent validation, see the SPC algorithm vignette.