regdatasets 0.2.0
Breaking Changes
naepdataset removed: The NAEP 2005 Grade 8 Mathematics dataset (17,606 x 301) has been removed because respondent-level NAEP microdata with weights are restricted-use under NCES data licensing policy. The package now contains 24 datasets. For NAEP analyses, use theEdSurveyR package to access NAEP data directly from NCES.berkeley$departmerenamed todepartment: The truncated Stata 8-character variable name has been corrected. Code usingberkeley$departmewill need to be updated.gcsereduced from 9 to 6 columns: Removedu0,u1, andfilter__columns, which contained Stata multilevel model artifacts (placeholder values of -1e30). Remaining columns:school,student,gcse,lrt,gender,pred.
Data Cleaning
Sentinel missing values recoded to
NAindisc(variablesbys34a,bys34b: codes 8 and 98) andnels_data(82 sentinel values across multiple variables: codes 99.98, 99.99, 99.998, 998, 100).titanic$embarked: Two blank strings converted toNA.disc$raceanddisc2$race: Converted from integer codes (1-4) to unordered factor with levels: Asian, Hispanic, Black, White.
New Features
list_datasets()function: Returns a tibble summarizing all 24 datasets with name, dimensions, chapter mapping, and description.LICENSE.note: New file documenting data source provenance and redistribution basis for each dataset.inst/CITATION: Added socitation("regdatasets")returns a proper bibliographic entry.
Documentation Improvements
Ethical context sections added to
penalty(racial disparities in capital sentencing, Radelet 1981, McCleskey v. Kemp),discanddisc2(school-to-prison pipeline, race in school discipline), andtitanic(note of respect for individuals represented).Cross-references (
@seealso) added linking related dataset pairs:berk_sub/berkeley,disc/disc2,hsb_sub/hsbs1, pluswomenlf/carData::Womenlfoverlap disclosure andberkeley/datasets::UCBAdmissionscomparison.gcsesource citation completed: Goldstein, H., Rasbash, J., Yang, M., et al. (1993). Oxford Review of Education, 19(4), 425-433.Updated
disc/disc2documentation to reflect sentinel value cleaning and race factor conversion.
Package Engineering
Added
tibbletoImportsinDESCRIPTIONto ensure correct deserialization of tibble objects.Expanded test suite: From 73 lines / 4 tests to comprehensive coverage of all 24 datasets (tibble class, dimensions, column names, factor levels, sentinel value absence, binary variable validation, Stata artifact absence).
Added
NEWS.md(this file).
