regdatasets provides a curated collection of 24 datasets for teaching introductory and intermediate regression analysis in education and social science contexts.
Installation
Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("joonho112/regdatasets")Datasets
| Dataset | N | Variables | Topic |
|---|---|---|---|
gcse |
4,059 | 9 | GCSE exam scores and London Reading Test (Chapters 1–2, 5) |
crime |
67 | 5 | Florida county crime rates (Chapters 4–5) |
instruction |
1,190 | 12 | Math achievement gains, multilevel (Chapters 2–3) |
reading |
66 | 8 | Three-group reading instruction experiment (Chapter 3) |
pisa2000 |
4,528 | 15 | OECD PISA reading scores, 3 countries (Chapter 1) |
individuals |
55,899 | 6 | BLS wage data by gender and sector (Chapter 5) |
faculty |
514 | 10 | Faculty salaries and experience (Chapter 5) |
nels_data |
2,000 | 299 | National Education Longitudinal Study (Chapters 7–8) |
hsbs1 |
350 | 16 | High School and Beyond survey (Chapter 8) |
gss_1 |
2,832 | 16 | General Social Survey 1982/1994 (Chapter 9) |
berkeley |
3,593 | 3 | UC Berkeley admissions, Simpson’s paradox (Chapter 10) |
penalty |
674 | 3 | Death penalty sentencing by race (Chapter 11) |
titanic |
1,309 | 6 | Titanic passenger survival (Chapters 9–11) |
womenlf |
263 | 5 | Women’s labor force participation (Chapter 13) |
satisfaction |
110 | 3 | Ordinal satisfaction survey (Chapter 13) |
alcohol1_pp |
246 | 9 | Adolescent alcohol use, person-period (Chapter 13) |
lambert |
492 | 109 | Canadian longitudinal child study (Chapter 12) |
civic_ed |
794 | 237 | Civic education pre/post study (Chapters 1, 7) |
disc |
2,000 | 15 | NELS:88 discipline study (Chapters 9–10) |
disc2 |
2,000 | 15 | NELS:88 with achievement scores (Chapters 10–11) |
berk_sub |
1,169 | 3 | Berkeley admissions subset (Chapter 10) |
hsb_sub |
188 | 7 | High School and Beyond subset (Chapter 7) |
classdata_07 |
44 | 12 | Classroom survey data (Chapter 1) |
grades |
198 | 8 | Essay grades and writing features (Chapter 12) |
Quick Start
library(regdatasets)
# --- Simple linear regression ---
data(gcse)
model <- lm(gcse ~ lrt, data = gcse)
summary(model)
# --- Logistic regression: Simpson's paradox ---
data(berkeley)
# Unadjusted: gender appears to matter
crude <- glm(admitted ~ female, family = binomial, data = berkeley)
# Adjusted: gender effect disappears after controlling for department
adjusted <- glm(admitted ~ female + factor(department),
family = binomial, data = berkeley)
exp(coef(crude)["female"]) # Crude OR ~ 0.64
exp(coef(adjusted)["female"]) # Adjusted OR ~ 0.97
# --- One-way ANOVA ---
data(reading)
model_anova <- lm(post1 ~ factor(group), data = reading)
anova(model_anova)Works Well With
regdatasets is designed to work seamlessly with modern R workflows:
library(tidyverse) # dplyr, ggplot2, tibble, ...
library(easystats) # parameters, performance, effectsize, see, ...
library(gtsummary) # publication-ready tables
library(marginaleffects) # predictions, comparisons, slopesExample using the easystats and marginaleffects ecosystem:
library(regdatasets)
library(easystats)
library(marginaleffects)
# Fit model
data(gcse)
model <- lm(gcse ~ lrt * school, data = gcse)
# easystats: model summary and diagnostics
model_parameters(model)
check_model(model)
# marginaleffects: predicted values and slopes
plot_predictions(model, condition = c("lrt", "school"))
avg_slopes(model, variables = "lrt", by = "school")Course Coverage
| Chapters | Topic | Datasets |
|---|---|---|
| 1–2 | SLR, ANCOVA |
gcse, civic_ed, classdata_07, pisa2000
|
| 3 | One-way ANOVA |
reading, instruction
|
| 4 | Continuous predictors | crime |
| 5 | Interactions |
gcse, individuals, faculty, crime
|
| 6 | Nonlinear relationships |
nels_data, crime
|
| 7–8 | Model building & diagnostics |
nels_data, hsbs1, hsb_sub
|
| 9–11 | Logistic regression |
gss_1, berkeley, penalty, titanic, disc
|
| 12 | Latent response & GLM |
lambert, grades
|
| 13 | Ordinal models |
womenlf, satisfaction, alcohol1_pp
|
Companion Textbook
This package accompanies Regression Lecture Notes, a Quarto-based interactive textbook for graduate-level regression analysis in education and social science.
