Salary and career information for 514 faculty members, including years of experience, academic rank, market conditions, and demographic variables. This dataset is commonly used to examine gender salary disparities while controlling for legitimate predictors of salary such as experience, rank, and market factors. It provides a rich example for multiple regression with a mix of continuous and binary predictors.
Format
A tibble with 514 rows and 10 columns:
- salary
Annual salary in dollars. Type: numeric. Range: (29,000, 96,156). Mean = 50,864. This is the primary outcome variable.
- exprior
Years of experience prior to current position. Type: numeric. Range: (0, 25). Mean = 2.9.
- yearsbg
Years since earning the bachelor's degree. Type: numeric. Range: (0, 41). Mean = 12.9.
- yearsrank
Years at current academic rank. Type: numeric. Range: (0, 28). Mean = 7.1.
- market
Market adjustment factor reflecting disciplinary salary norms. Type: numeric. Range: (0.71, 1.33). Mean = 0.95. Values above 1.0 indicate fields with above-average market salaries.
- termdeg
Holds a terminal degree (e.g., Ph.D.). Type: numeric. Binary indicator (0/1) where 1 = holds terminal degree, 0 = does not. 99% hold a terminal degree.
- admin
Holds an administrative appointment. Type: numeric. Binary indicator (0/1) where 1 = has administrative role, 0 = does not. Only 3% have administrative roles.
- yearsdg
Years since earning the highest degree. Type: numeric. Range: (0, 41). Mean = 15.3.
- rank
Academic rank. Type: numeric. Values: 1, 2, 3. Where 1 = Assistant Professor, 2 = Associate Professor, 3 = Full Professor. Mean = 2.1.
- male
Sex of faculty member. Type: numeric. Binary indicator (0/1) where 1 = male, 0 = female. 75% are male.
Details
This dataset is used in Chapters 4-8 (Multiple Regression with Continuous Predictors, Interactions, Nonlinear Relationships, Model Building, and Model Diagnostics). Key analyses include: examining the gender salary gap while controlling for experience, rank, and market factors; exploring nonlinear relationships (e.g., diminishing returns to experience); model building using forward selection and backward elimination; and regression diagnostics including outlier detection and influential observations.
Note that the experience variables (exprior, yearsbg, yearsrank, yearsdg) are correlated with each other and with rank, presenting opportunities to discuss collinearity. The log transformation of salary may improve model fit due to the right-skewed salary distribution.
Examples
data(faculty)
head(faculty)
#> # A tibble: 6 × 10
#> salary exprior yearsbg yearsrank market termdeg admin yearsdg rank male
#> <dbl> <int> <int> <int> <dbl> <int> <int> <int> <int> <int>
#> 1 38362. 0 14 2 0.720 1 0 14 2 0
#> 2 68906 2 29 20 1 1 0 31 3 1
#> 3 55979 0 14 3 1.04 1 0 14 3 1
#> 4 61008 0 3 3 1.24 1 0 2 1 1
#> 5 42977 5 7 1 0.990 1 0 12 2 1
#> 6 51640 4 13 5 0.990 1 0 17 3 0
# Gender salary gap, unadjusted
lm(salary ~ male, data = faculty)
#>
#> Call:
#> lm(formula = salary ~ male, data = faculty)
#>
#> Coefficients:
#> (Intercept) male
#> 42917 10583
#>
# Adjusted for rank, experience, and market
lm(salary ~ male + factor(rank) + yearsdg + market, data = faculty)
#>
#> Call:
#> lm(formula = salary ~ male + factor(rank) + yearsdg + market,
#> data = faculty)
#>
#> Coefficients:
#> (Intercept) male factor(rank)2 factor(rank)3 yearsdg
#> 986.8 1084.1 3100.3 11132.8 556.4
#> market
#> 36931.7
#>
# Log salary model
lm(log(salary) ~ male + factor(rank) + yearsdg + market, data = faculty)
#>
#> Call:
#> lm(formula = log(salary) ~ male + factor(rank) + yearsdg + market,
#> data = faculty)
#>
#> Coefficients:
#> (Intercept) male factor(rank)2 factor(rank)3 yearsdg
#> 9.79315 0.02274 0.09411 0.24440 0.01048
#> market
#> 0.74433
#>
