Data from the Programme for International Student Assessment (PISA) 2000, an international assessment coordinated by the Organisation for Economic Co-operation and Development (OECD). This subsample of 4,528 fifteen-year-old students from three countries (United Kingdom, Germany, and United States) includes reading literacy scores, parental occupational status, immigration background, and school identifiers. Used in regression teaching to illustrate multiple regression with continuous and binary predictors across countries.
Format
A tibble with 4,528 rows and 15 columns:
- country
Country code (ISO 3166 numeric). Type: character. Values: "826" = United Kingdom, "276" = Germany, "840" = United States.
- female
Student gender indicator. Type: numeric. Binary (0/1) where 1 = female, 0 = male.
- isei
International Socio-Economic Index of Occupational Status, based on highest parental occupation. Type: numeric. Range: (16, 90). Higher values indicate higher occupational status.
- wleread
Weighted likelihood estimate of reading proficiency. Type: numeric. Range: (161.39, 881.39). PISA scale with international mean approximately 500 and SD approximately 100.
- high_sch
Parental education: high school only indicator. Type: numeric. Binary (0/1) where 1 = highest parental education is high school.
- college
Parental education: college or higher indicator. Type: numeric. Binary (0/1) where 1 = at least one parent has college degree.
- one_for
Immigration status: one parent foreign-born. Type: numeric. Binary (0/1) where 1 = one parent foreign-born.
- both_for
Immigration status: both parents foreign-born. Type: numeric. Binary (0/1) where 1 = both parents foreign-born.
- test_lan
Test language spoken at home. Type: numeric. Binary (0/1) where 1 = language of the test is the primary language spoken at home.
- reading
Reading proficiency level (ordinal). Type: numeric. Range: (0, 5). PISA reading proficiency level where 0 = below Level 1, 1 = Level 1, ..., 5 = Level 5.
- pass_rea
Pass reading benchmark indicator. Type: numeric. Binary (0/1) where 1 = proficiency at or above Level 3.
- id_schoo
School identifier. Type: numeric. Range: (1, 990).
- uk
Country dummy for United Kingdom. Type: numeric. Binary (0/1) where 1 = student from UK.
- germany
Country dummy for Germany. Type: numeric. Binary (0/1) where 1 = student from Germany.
- usa
Country dummy for United States. Type: numeric. Binary (0/1) where 1 = student from USA.
Source
OECD (2002). PISA 2000 Technical Report. Paris: OECD
Publishing. Programme for International Student Assessment.
Original data file: pisa2000.dta
Details
This dataset supports analyses across multiple chapters for demonstrating regression with international education data. The PISA 2000 assessment measured reading, mathematics, and science literacy of 15-year-olds in participating OECD and partner countries. Key analyses include: multiple regression of reading scores on SES and demographic predictors, logistic regression predicting passage of reading benchmarks, country comparisons using dummy variables, and examining the effects of immigration background and parental education on student achievement. The country dummies (uk, germany, usa) sum to 1 for all rows, with one serving as the reference category.
Examples
data(pisa2000)
head(pisa2000)
#> # A tibble: 6 × 15
#> country female isei wleread high_sch college one_for both_for test_lan
#> <chr> <int> <int> <dbl> <int> <int> <int> <int> <int>
#> 1 826 0 30 539. 0 0 0 0 1
#> 2 826 1 30 599. 0 0 0 0 1
#> 3 826 0 61 502. 0 1 0 0 1
#> 4 826 0 34 555. 0 0 0 0 0
#> 5 826 0 23 465. 0 0 0 0 1
#> 6 826 0 51 565. 0 0 0 0 1
#> # ℹ 6 more variables: reading <int>, pass_rea <int>, id_schoo <int>, uk <int>,
#> # germany <int>, usa <int>
# Regression of reading on socioeconomic status and gender
lm(wleread ~ isei + female, data = pisa2000)
#>
#> Call:
#> lm(formula = wleread ~ isei + female, data = pisa2000)
#>
#> Coefficients:
#> (Intercept) isei female
#> 435.427 1.678 27.721
#>
