NELS:88 Discipline and School Experiences Study

A subset of 2,000 students from the National Education Longitudinal Study of 1988 (NELS:88), focused on school discipline outcomes and their predictors. The dataset includes student demographics, family background variables, and school experience indicators related to disciplinary actions. Variables use original NELS:88 variable naming conventions (e.g., bys55a, bys12). This dataset is used for teaching logistic regression with binary outcomes and handling missing data.

Usage

disc

Format

A tibble with 2,000 rows and 15 columns:

bys55a: Frequency of being sent to the office. Type: numeric. Values: 0, 1, 2, 8. Where 0 = never, 1 = once or twice, 2 = more than twice, 8 = missing/not applicable. Mean = 0.47.
bys12: Family composition. Type: numeric. Values: 1, 2, 8. Where 1 = two-parent family, 2 = other family structure, 8 = missing/not applicable.
bys31a: Hours spent on homework per week. Type: numeric. Range: (1, 8). Ordinal scale where higher values indicate more homework hours, 8 = missing/not applicable.
bys34a: Self-concept: English ability. Type: numeric. Range: (1, 98). Ordinal self-rating, 98 = missing/not applicable.
bys34b: Self-concept: math ability. Type: numeric. Range: (1, 98). Ordinal self-rating, 98 = missing/not applicable.
bys35p: Plans after high school. Type: numeric. Values: 1, 2, 3, 8. Where 1 = attend college, 8 = missing/not applicable.
bys36c: Importance of good grades. Type: numeric. Range: (1, 8). Ordinal scale, 8 = missing/not applicable.
sentoff: Ever sent to the office for misbehavior. Type: numeric. Binary indicator (0/1) where 1 = sent to the office at least once, 0 = never sent. NA = 28. This is the primary outcome variable for logistic regression.
male: Sex of student. Type: numeric. Binary indicator (0/1) where 1 = male, 0 = female. NA = 20.
race: Race/ethnicity. Type: numeric. Values: 1, 2, 3, 4. Where 1 = Asian, 2 = Hispanic, 3 = Black, 4 = White. NA = 92.
fath_ed: Father has at least a high school education. Type: numeric. Binary indicator (0/1) where 1 = high school or above, 0 = less than high school. NA = 282.
moth_ed: Mother has at least a high school education. Type: numeric. Binary indicator (0/1) where 1 = high school or above, 0 = less than high school. NA = 208.
bedroom: Has own bedroom. Type: numeric. Binary indicator (0/1) where 1 = has own bedroom, 0 = does not. NA = 33.
discuss: Frequently discusses school with parents. Type: numeric. Binary indicator (0/1) where 1 = frequently discusses, 0 = does not. NA = 33.
osentoff: Ordinal version of sent to office. Type: numeric. Values: 0, 1, 2. Where 0 = never, 1 = once or twice, 2 = more than twice. NA = 28. Used for ordinal regression models.

Source

National Center for Education Statistics (1988). National Education Longitudinal Study of 1988 (NELS:88). U.S. Department of Education. Original data file: disc.dta

Details

This dataset is used in Chapters 9-13 (Logistic Regression, Multiple Logistic Regression, Model Fit, and Ordinal Response Models) to illustrate binary and ordinal logistic regression. Key analyses include: fitting simple and multiple logistic regression models predicting office referrals, computing and interpreting odds ratios, likelihood ratio tests, and ordinal regression using osentoff as the outcome.

Note that several variables use NELS:88 coding where 8 or 98 indicates missing/not applicable rather than NA. The derived binary and ordinal outcome variables (sentoff, osentoff) have been recoded with proper NA values.

Examples

data(disc)
head(disc)
#> # A tibble: 6 × 15
#>   bys55a bys12 bys31a bys34a bys34b bys35p bys36c sentoff  male race  fath_ed
#>    <int> <int>  <int>  <int>  <int>  <int>  <int>   <int> <int> <fct>   <int>
#> 1      0     2      4      1      2      1      3       0     0 White       0
#> 2      0     1      4      2      2      1      3       0     1 White       1
#> 3      0     1      4      3      3      1      2       0     1 White       1
#> 4      0     1      4      2      2      1      3       0     1 White       1
#> 5      1     2      3      2      3      2      2       1     0 Black       1
#> 6      0     1      4      4      1      1      3       0     1 White       1
#> # ℹ 4 more variables: moth_ed <int>, bedroom <int>, discuss <int>,
#> #   osentoff <int>

# Logistic regression: gender effect on office referral
glm(sentoff ~ male, data = disc, family = binomial)
#> 
#> Call:  glm(formula = sentoff ~ male, family = binomial, data = disc)
#> 
#> Coefficients:
#> (Intercept)         male  
#>      -1.626        1.209  
#> 
#> Degrees of Freedom: 1953 Total (i.e. Null);  1952 Residual
#>   (46 observations deleted due to missingness)
#> Null Deviance:	    2313 
#> Residual Deviance: 2179 	AIC: 2183

# Multiple logistic regression with demographics
glm(sentoff ~ male + factor(race) + fath_ed, data = disc, family = binomial)
#> 
#> Call:  glm(formula = sentoff ~ male + factor(race) + fath_ed, family = binomial, 
#>     data = disc)
#> 
#> Coefficients:
#>          (Intercept)                  male  factor(race)Hispanic  
#>              -1.6235                1.2178                0.3775  
#>    factor(race)Black     factor(race)White               fath_ed  
#>               1.2273                0.3771               -0.5617  
#> 
#> Degrees of Freedom: 1605 Total (i.e. Null);  1600 Residual
#>   (394 observations deleted due to missingness)
#> Null Deviance:	    1892 
#> Residual Deviance: 1752 	AIC: 1764