PISA 2000 International Reading Assessment Data

Data from the Programme for International Student Assessment (PISA) 2000, an international assessment coordinated by the Organisation for Economic Co-operation and Development (OECD). This subsample of 4,528 fifteen-year-old students from three countries (United Kingdom, Germany, and United States) includes reading literacy scores, parental occupational status, immigration background, and school identifiers. Used in regression teaching to illustrate multiple regression with continuous and binary predictors across countries.

Usage

pisa2000

Format

A tibble with 4,528 rows and 15 columns:

country: Country code (ISO 3166 numeric). Type: character. Values: "826" = United Kingdom, "276" = Germany, "840" = United States.
female: Student gender indicator. Type: numeric. Binary (0/1) where 1 = female, 0 = male.
isei: International Socio-Economic Index of Occupational Status, based on highest parental occupation. Type: numeric. Range: (16, 90). Higher values indicate higher occupational status.
wleread: Weighted likelihood estimate of reading proficiency. Type: numeric. Range: (161.39, 881.39). PISA scale with international mean approximately 500 and SD approximately 100.
high_sch: Parental education: high school only indicator. Type: numeric. Binary (0/1) where 1 = highest parental education is high school.
college: Parental education: college or higher indicator. Type: numeric. Binary (0/1) where 1 = at least one parent has college degree.
one_for: Immigration status: one parent foreign-born. Type: numeric. Binary (0/1) where 1 = one parent foreign-born.
both_for: Immigration status: both parents foreign-born. Type: numeric. Binary (0/1) where 1 = both parents foreign-born.
test_lan: Test language spoken at home. Type: numeric. Binary (0/1) where 1 = language of the test is the primary language spoken at home.
reading: Reading proficiency level (ordinal). Type: numeric. Range: (0, 5). PISA reading proficiency level where 0 = below Level 1, 1 = Level 1, ..., 5 = Level 5.
pass_rea: Pass reading benchmark indicator. Type: numeric. Binary (0/1) where 1 = proficiency at or above Level 3.
id_schoo: School identifier. Type: numeric. Range: (1, 990).
uk: Country dummy for United Kingdom. Type: numeric. Binary (0/1) where 1 = student from UK.
germany: Country dummy for Germany. Type: numeric. Binary (0/1) where 1 = student from Germany.
usa: Country dummy for United States. Type: numeric. Binary (0/1) where 1 = student from USA.

Source

OECD (2002). PISA 2000 Technical Report. Paris: OECD Publishing. Programme for International Student Assessment. Original data file: pisa2000.dta

Details

This dataset supports analyses across multiple chapters for demonstrating regression with international education data. The PISA 2000 assessment measured reading, mathematics, and science literacy of 15-year-olds in participating OECD and partner countries. Key analyses include: multiple regression of reading scores on SES and demographic predictors, logistic regression predicting passage of reading benchmarks, country comparisons using dummy variables, and examining the effects of immigration background and parental education on student achievement. The country dummies (uk, germany, usa) sum to 1 for all rows, with one serving as the reference category.

Examples

data(pisa2000)
head(pisa2000)
#> # A tibble: 6 × 15
#>   country female  isei wleread high_sch college one_for both_for test_lan
#>   <chr>    <int> <int>   <dbl>    <int>   <int>   <int>    <int>    <int>
#> 1 826          0    30    539.        0       0       0        0        1
#> 2 826          1    30    599.        0       0       0        0        1
#> 3 826          0    61    502.        0       1       0        0        1
#> 4 826          0    34    555.        0       0       0        0        0
#> 5 826          0    23    465.        0       0       0        0        1
#> 6 826          0    51    565.        0       0       0        0        1
#> # ℹ 6 more variables: reading <int>, pass_rea <int>, id_schoo <int>, uk <int>,
#> #   germany <int>, usa <int>
# Regression of reading on socioeconomic status and gender
lm(wleread ~ isei + female, data = pisa2000)
#> 
#> Call:
#> lm(formula = wleread ~ isei + female, data = pisa2000)
#> 
#> Coefficients:
#> (Intercept)         isei       female  
#>     435.427        1.678       27.721  
#>