A subset of 2,832 respondents from the General Social Survey (GSS), containing variables on social attitudes, demographics, and behaviors. The GSS is a nationally representative survey of U.S. adults that has been conducted since 1972. This subset includes measures of attitudes toward abortion, cohabitation, political conservatism, religiosity, and demographic characteristics. It is used extensively for logistic regression examples in the course.
Format
A tibble with 2,832 rows and 16 columns:
- abortion
Abortion attitude scale. Type: numeric. Range: (0, 7). Count of the number of conditions (out of 7) under which the respondent believes abortion should be legal. Higher values indicate more permissive attitudes. NA = 964.
- cohabit
Attitude toward cohabitation. Type: numeric. Range: (2, 10). Higher values indicate more favorable attitudes toward cohabitation. NA = 1,571.
- income
Household income category. Type: numeric. Range: (1, 23). Ordinal income brackets where higher values indicate higher income. NA = 983.
- conserv
Political conservatism. Type: numeric. Range: (1, 7). 7-point scale where 1 = extremely liberal and 7 = extremely conservative. NA = 141.
- educat
Respondent's years of education. Type: numeric. Range: (0, 20). Mean = 13.25. NA = 12.
- male
Sex of respondent. Type: numeric. Binary indicator (0/1) where 1 = male, 0 = female. 44% male.
- maed
Mother's years of education. Type: numeric. Range: (0, 20). Mean = 11.46. NA = 433.
- paed
Father's years of education. Type: numeric. Range: (0, 20). Mean = 11.34. NA = 791.
- marital
Marital status. Type: numeric. Values: 1, 2, 3, 4, 5. Where 1 = married, 2 = widowed, 3 = divorced, 4 = separated, 5 = never married. NA = 1.
- partnrs5
Number of sexual partners in the last 5 years. Type: numeric. Range: (0, 8). Where 8 = 8 or more. NA = 495.
- respage
Respondent's age in years. Type: numeric. Range: (18, 89). Mean = 45.6.
- race
Race of respondent. Type: numeric. Values: 1, 2, 3. Where 1 = White, 2 = Black, 3 = Other.
- black
Black racial identification. Type: numeric. Binary indicator (0/1) where 1 = Black, 0 = not Black. 14% Black.
- othrace
Other race identification. Type: numeric. Binary indicator (0/1) where 1 = Other race (not White or Black), 0 = White or Black. 7% Other.
- relosity
Religiosity composite score. Type: numeric. Range: (-8.68, 4.16). Mean approximately 0. Standardized composite measure of religious behavior and attitudes. Higher values indicate greater religiosity.
- religion
Religious affiliation. Type: numeric. Values: 1, 2, 3, 4, 5. Where 1 = Protestant, 2 = Catholic, 3 = Jewish, 4 = None, 5 = Other. NA = 35.
Source
Smith, T. W., Davern, M., Stier, J., & Marsden, P. V. General
Social Surveys, 1972-2018. National Opinion Research Center (NORC) at
the University of Chicago. Original data file: gss_1.dta
Details
This dataset is used in Chapters 9-11 (Simple and Multiple Logistic Regression, Model Fit and Diagnostics) for logistic regression examples. Key analyses include: modeling agreement with social policy statements as a function of year (illustrated with a dummy variable for survey year), computing and interpreting odds ratios, and examining how attitudes toward abortion relate to conservatism, education, and religiosity.
The dataset has substantial missing data on several variables (e.g., cohabit with 55% missing, abortion with 34% missing), providing opportunities to discuss missing data handling and its impact on regression analysis.
Examples
data(gss_1)
head(gss_1)
#> # A tibble: 6 × 16
#> abortion cohabit income conserv educat male maed paed marital partnrs5
#> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 NA 7 18 4 12 1 12 12 3 1
#> 2 NA 9 11 2 17 0 NA 20 5 0
#> 3 NA 3 21 6 12 1 12 12 1 1
#> 4 3 8 11 2 13 1 12 NA 5 7
#> 5 7 NA 18 4 16 0 12 NA 5 2
#> 6 5.60 NA NA 4 16 1 6 9 3 5
#> # ℹ 6 more variables: respage <dbl>, race <int>, black <int>, othrace <int>,
#> # relosity <dbl>, religion <int>
# Logistic regression: conservatism predicting abortion attitude
gss_1$pro_abortion <- as.integer(gss_1$abortion >= 4)
glm(pro_abortion ~ conserv, data = gss_1, family = binomial)
#>
#> Call: glm(formula = pro_abortion ~ conserv, family = binomial, data = gss_1)
#>
#> Coefficients:
#> (Intercept) conserv
#> 1.6297 -0.3908
#>
#> Degrees of Freedom: 1765 Total (i.e. Null); 1764 Residual
#> (1066 observations deleted due to missingness)
#> Null Deviance: 2448
#> Residual Deviance: 2324 AIC: 2328
# Multiple regression: education and religiosity on abortion attitudes
lm(abortion ~ educat + relosity + conserv, data = gss_1)
#>
#> Call:
#> lm(formula = abortion ~ educat + relosity + conserv, data = gss_1)
#>
#> Coefficients:
#> (Intercept) educat relosity conserv
#> 3.9630 0.1452 -0.2872 -0.4182
#>
