Skip to contents

Data from the Programme for International Student Assessment (PISA) 2000, an international assessment coordinated by the Organisation for Economic Co-operation and Development (OECD). This subsample of 4,528 fifteen-year-old students from three countries (United Kingdom, Germany, and United States) includes reading literacy scores, parental occupational status, immigration background, and school identifiers. Used in regression teaching to illustrate multiple regression with continuous and binary predictors across countries.

Usage

pisa2000

Format

A tibble with 4,528 rows and 15 columns:

country

Country code (ISO 3166 numeric). Type: character. Values: "826" = United Kingdom, "276" = Germany, "840" = United States.

female

Student gender indicator. Type: numeric. Binary (0/1) where 1 = female, 0 = male.

isei

International Socio-Economic Index of Occupational Status, based on highest parental occupation. Type: numeric. Range: (16, 90). Higher values indicate higher occupational status.

wleread

Weighted likelihood estimate of reading proficiency. Type: numeric. Range: (161.39, 881.39). PISA scale with international mean approximately 500 and SD approximately 100.

high_sch

Parental education: high school only indicator. Type: numeric. Binary (0/1) where 1 = highest parental education is high school.

college

Parental education: college or higher indicator. Type: numeric. Binary (0/1) where 1 = at least one parent has college degree.

one_for

Immigration status: one parent foreign-born. Type: numeric. Binary (0/1) where 1 = one parent foreign-born.

both_for

Immigration status: both parents foreign-born. Type: numeric. Binary (0/1) where 1 = both parents foreign-born.

test_lan

Test language spoken at home. Type: numeric. Binary (0/1) where 1 = language of the test is the primary language spoken at home.

reading

Reading proficiency level (ordinal). Type: numeric. Range: (0, 5). PISA reading proficiency level where 0 = below Level 1, 1 = Level 1, ..., 5 = Level 5.

pass_rea

Pass reading benchmark indicator. Type: numeric. Binary (0/1) where 1 = proficiency at or above Level 3.

id_schoo

School identifier. Type: numeric. Range: (1, 990).

uk

Country dummy for United Kingdom. Type: numeric. Binary (0/1) where 1 = student from UK.

germany

Country dummy for Germany. Type: numeric. Binary (0/1) where 1 = student from Germany.

usa

Country dummy for United States. Type: numeric. Binary (0/1) where 1 = student from USA.

Source

OECD (2002). PISA 2000 Technical Report. Paris: OECD Publishing. Programme for International Student Assessment. Original data file: pisa2000.dta

Details

This dataset supports analyses across multiple chapters for demonstrating regression with international education data. The PISA 2000 assessment measured reading, mathematics, and science literacy of 15-year-olds in participating OECD and partner countries. Key analyses include: multiple regression of reading scores on SES and demographic predictors, logistic regression predicting passage of reading benchmarks, country comparisons using dummy variables, and examining the effects of immigration background and parental education on student achievement. The country dummies (uk, germany, usa) sum to 1 for all rows, with one serving as the reference category.

Examples

data(pisa2000)
head(pisa2000)
#> # A tibble: 6 × 15
#>   country female  isei wleread high_sch college one_for both_for test_lan
#>   <chr>    <int> <int>   <dbl>    <int>   <int>   <int>    <int>    <int>
#> 1 826          0    30    539.        0       0       0        0        1
#> 2 826          1    30    599.        0       0       0        0        1
#> 3 826          0    61    502.        0       1       0        0        1
#> 4 826          0    34    555.        0       0       0        0        0
#> 5 826          0    23    465.        0       0       0        0        1
#> 6 826          0    51    565.        0       0       0        0        1
#> # ℹ 6 more variables: reading <int>, pass_rea <int>, id_schoo <int>, uk <int>,
#> #   germany <int>, usa <int>
# Regression of reading on socioeconomic status and gender
lm(wleread ~ isei + female, data = pisa2000)
#> 
#> Call:
#> lm(formula = wleread ~ isei + female, data = pisa2000)
#> 
#> Coefficients:
#> (Intercept)         isei       female  
#>     435.427        1.678       27.721  
#>