Skip to contents

A subset of 2,000 students from the National Education Longitudinal Study of 1988 (NELS:88), focused on school discipline outcomes and their predictors. The dataset includes student demographics, family background variables, and school experience indicators related to disciplinary actions. Variables use original NELS:88 variable naming conventions (e.g., bys55a, bys12). This dataset is used for teaching logistic regression with binary outcomes and handling missing data.

Usage

disc

Format

A tibble with 2,000 rows and 15 columns:

bys55a

Frequency of being sent to the office. Type: numeric. Values: 0, 1, 2, 8. Where 0 = never, 1 = once or twice, 2 = more than twice, 8 = missing/not applicable. Mean = 0.47.

bys12

Family composition. Type: numeric. Values: 1, 2, 8. Where 1 = two-parent family, 2 = other family structure, 8 = missing/not applicable.

bys31a

Hours spent on homework per week. Type: numeric. Range: (1, 8). Ordinal scale where higher values indicate more homework hours, 8 = missing/not applicable.

bys34a

Self-concept: English ability. Type: numeric. Range: (1, 98). Ordinal self-rating, 98 = missing/not applicable.

bys34b

Self-concept: math ability. Type: numeric. Range: (1, 98). Ordinal self-rating, 98 = missing/not applicable.

bys35p

Plans after high school. Type: numeric. Values: 1, 2, 3, 8. Where 1 = attend college, 8 = missing/not applicable.

bys36c

Importance of good grades. Type: numeric. Range: (1, 8). Ordinal scale, 8 = missing/not applicable.

sentoff

Ever sent to the office for misbehavior. Type: numeric. Binary indicator (0/1) where 1 = sent to the office at least once, 0 = never sent. NA = 28. This is the primary outcome variable for logistic regression.

male

Sex of student. Type: numeric. Binary indicator (0/1) where 1 = male, 0 = female. NA = 20.

race

Race/ethnicity. Type: numeric. Values: 1, 2, 3, 4. Where 1 = Asian, 2 = Hispanic, 3 = Black, 4 = White. NA = 92.

fath_ed

Father has at least a high school education. Type: numeric. Binary indicator (0/1) where 1 = high school or above, 0 = less than high school. NA = 282.

moth_ed

Mother has at least a high school education. Type: numeric. Binary indicator (0/1) where 1 = high school or above, 0 = less than high school. NA = 208.

bedroom

Has own bedroom. Type: numeric. Binary indicator (0/1) where 1 = has own bedroom, 0 = does not. NA = 33.

discuss

Frequently discusses school with parents. Type: numeric. Binary indicator (0/1) where 1 = frequently discusses, 0 = does not. NA = 33.

osentoff

Ordinal version of sent to office. Type: numeric. Values: 0, 1, 2. Where 0 = never, 1 = once or twice, 2 = more than twice. NA = 28. Used for ordinal regression models.

Source

National Center for Education Statistics (1988). National Education Longitudinal Study of 1988 (NELS:88). U.S. Department of Education. Original data file: disc.dta

Details

This dataset is used in Chapters 9-13 (Logistic Regression, Multiple Logistic Regression, Model Fit, and Ordinal Response Models) to illustrate binary and ordinal logistic regression. Key analyses include: fitting simple and multiple logistic regression models predicting office referrals, computing and interpreting odds ratios, likelihood ratio tests, and ordinal regression using osentoff as the outcome.

Note that several variables use NELS:88 coding where 8 or 98 indicates missing/not applicable rather than NA. The derived binary and ordinal outcome variables (sentoff, osentoff) have been recoded with proper NA values.

Examples

data(disc)
head(disc)
#> # A tibble: 6 × 15
#>   bys55a bys12 bys31a bys34a bys34b bys35p bys36c sentoff  male race  fath_ed
#>    <int> <int>  <int>  <int>  <int>  <int>  <int>   <int> <int> <fct>   <int>
#> 1      0     2      4      1      2      1      3       0     0 White       0
#> 2      0     1      4      2      2      1      3       0     1 White       1
#> 3      0     1      4      3      3      1      2       0     1 White       1
#> 4      0     1      4      2      2      1      3       0     1 White       1
#> 5      1     2      3      2      3      2      2       1     0 Black       1
#> 6      0     1      4      4      1      1      3       0     1 White       1
#> # ℹ 4 more variables: moth_ed <int>, bedroom <int>, discuss <int>,
#> #   osentoff <int>

# Logistic regression: gender effect on office referral
glm(sentoff ~ male, data = disc, family = binomial)
#> 
#> Call:  glm(formula = sentoff ~ male, family = binomial, data = disc)
#> 
#> Coefficients:
#> (Intercept)         male  
#>      -1.626        1.209  
#> 
#> Degrees of Freedom: 1953 Total (i.e. Null);  1952 Residual
#>   (46 observations deleted due to missingness)
#> Null Deviance:	    2313 
#> Residual Deviance: 2179 	AIC: 2183

# Multiple logistic regression with demographics
glm(sentoff ~ male + factor(race) + fath_ed, data = disc, family = binomial)
#> 
#> Call:  glm(formula = sentoff ~ male + factor(race) + fath_ed, family = binomial, 
#>     data = disc)
#> 
#> Coefficients:
#>          (Intercept)                  male  factor(race)Hispanic  
#>              -1.6235                1.2178                0.3775  
#>    factor(race)Black     factor(race)White               fath_ed  
#>               1.2273                0.3771               -0.5617  
#> 
#> Degrees of Freedom: 1605 Total (i.e. Null);  1600 Residual
#>   (394 observations deleted due to missingness)
#> Null Deviance:	    1892 
#> Residual Deviance: 1752 	AIC: 1764