Skip to content

JuliaStats/RDatasets.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDatasets.jl

Build status

The RDatasets package provides an easy way for Julia users to experiment with most of the standard data sets that are available in the core of R as well as datasets included with many of R's most popular packages. This package is essentially a simplistic port of the Rdatasets repo created by Vincent Arelbundock, who conveniently gathered data sets from many of the standard R packages in one convenient location on GitHub at https://github.com/vincentarelbundock/Rdatasets

In order to load one of the data sets included in the RDatasets package, you will need to have the DataFrames package installed. This package is automatically installed as a dependency of the RDatasets package if you install RDatasets as follows:

Pkg.add("RDatasets")

After installing the RDatasets package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:

using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")

Data Sets

The RDatasets.packages() function returns a table of represented R packages:

Package Title
COUNT Functions, data and code for count data.
Ecdat Data sets for econometrics
HSAUR A Handbook of Statistical Analyses Using R (1st Edition)
HistData Data sets from the history of statistics and data visualization
ISLR Data for An Introduction to Statistical Learning with Applications in R
KMsurv Data sets from Klein and Moeschberger (1997), Survival Analysis
MASS Support Functions and Datasets for Venables and Ripley's MASS
SASmixed Data sets from "SAS System for Mixed Models"
Zelig Everyone's Statistical Software
adehabitatLT Analysis of Animal Movements
boot Bootstrap Functions (Originally by Angelo Canty for S)
car Companion to Applied Regression
cluster Cluster Analysis Extended Rousseeuw et al.
datasets The R Datasets Package
gamair Datasets used in the book Generalized Additive Models: An Introduction with R
gap Genetic analysis package
ggplot2 An Implementation of the Grammar of Graphics
lattice Lattice Graphics
lme4 Linear mixed-effects models using Eigen and S4
mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation
mlmRev Examples from Multilevel Modelling Software Review
nlreg Higher Order Inference for Nonlinear Heteroscedastic Models
plm Linear Models for Panel Data
plyr Tools for splitting, applying and combining data
pscl Political Science Computational Laboratory, Stanford University
psych Procedures for Psychological, Psychometric, and Personality Research
quantreg Quantile Regression
reshape2 Flexibly Reshape Data: A Reboot of the Reshape Package.
robustbase Basic Robust Statistics
rpart Recursive Partitioning and Regression Trees
sandwich Robust Covariance Matrix Estimators
sem Structural Equation Models
survival Survival Analysis
vcd Visualizing Categorical Data

The RDatasets.datasets() function returns a table describing the 700+ included datasets. Or pass in a package name (e.g. RDatasets.datasets("mlmRev")) for a targeted table:

Package Dataset Title Rows Columns
mlmRev Chem97 Scores on A-level Chemistry in 1997 31022 8
mlmRev Contraception Contraceptive use in Bangladesh 1934 6
mlmRev Early Early childhood intervention study 309 4
mlmRev Exam Exam scores from inner London 4059 10
mlmRev Gcsemv GCSE exam score 1905 5
mlmRev Hsb82 High School and Beyond - 1982 7185 8
mlmRev Mmmec Malignant melanoma deaths in Europe 354 6
mlmRev Oxboys Heights of Boys in Oxford 234 4
mlmRev ScotsSec Scottish secondary school scores 3435 6
mlmRev bdf Language Scores of 8-Graders in The Netherlands 2287 28
mlmRev egsingle US Sustaining Effects study 7230 12
mlmRev guImmun Immunization in Guatemala 2159 13
mlmRev guPrenat Prenatal care in Guatemala 2449 15
mlmRev star Student Teacher Achievement Ratio (STAR) project data 26796 18

Licensing and Intellectual Property

Following Vincent's lead, we have assumed that all of the data sets in this repository can be made available under the GPL-3 license. If you know that one of the datasets released here should not be released publicly or if you know that a data set can only be released under a different license, please contact me so that I can remove the data set from this repository.