HSDS

The goal of HSDS is to make all the datasets of the book “A Handbook of Small Data Sets” (1994) of David J. Hand available. These data sets are especially useful for demonstrating statistical methods, testing functions, or teaching statistics and R programming.

While the individual datasets are already available in a separate repository. they are not formatted for immediate use in R and lack documentation. This package addresses these issues by providing clean and fully documented datasets ready for analysis.

Do you like this package and want to support its development ?

Installation

To install the development version of HSDS from GitHub, use the following command:

devtools::install_github("ABohynDOE/HSDS")

Available data sets

The book contains over 500 datasets. Currently, only 16 datasets (3%) have been processed and included in this package.

The table below summarizes 10 randomly selected datasets included so far, with details on their names, descriptions, structures, and variable types.

Name	Description	Structure	Variable types
`lengths`	Guessing lengths	113 × 3	factor (1), numeric (2)
`darwin`	Darwin’s cross-fertilized and self-fertilized plants	30 × 3	factor (1), integer (1), numeric (1)
`interval`	Intervals between cars on the M1 motorway	41 × 2	character (2)
`tearing`	Tearing factor for paper	20 × 2	numeric (2)
`abrasion`	Abrasion loss	30 × 3	numeric (3)
`chickens`	Weight of chickens	24 × 3	factor (2), numeric (1)
`chloride`	Effect of ammonium chloride on yield	32 × 5	factor (4), numeric (1)
`software`	Software system failures	136 × 2	integer (1), numeric (1)
`piston`	Piston-ring failures	12 × 3	character (1), integer (1), numeric (1)
`pastes`	Strength of chemical pastes	60 × 4	factor (3), numeric (1)

Example

Here’s a simple example demonstrating how to use one of the datasets to create a visualization:

library(hsds)
library(ggplot2)

ggplot(germin, aes(x = water, y = seeds, color = box)) +
  geom_boxplot(na.rm = T) +
  theme_bw()

Contributing to the package

We are far from reaching the goal of 500 datasets, so your contributions are more than welcome! If you’d like to help, all raw datasets are already available in the repository under data-raw/data-files. Feel free to clean one or more datasets and submit your contributions.

To simplify the contributing process, the package provides two helper functions:

data_list()
Use this function to list the datasets that have already been processed and identify the next datasets that need to be processed. This ensures efficient collaboration and avoids duplication of effort.
data_setup(data)
This function sets up all the necessary files for processing a new dataset. When you run data_setup(data), it generates three files, all named data.R, but placed in different locations:
- inst/examples/: Contains an example of usage for the dataset.
- data-raw/: Includes a script to process the raw dataset.
- R/: Documents the dataset for use in the package.

When contributing, please also follow these guidelines:

Dataset Naming
Name each dataset based on the data structure index provided in the book. The index is available here or in the Excel file data-raw/raw_data_index.xlsx.
Variable Labelling
Ensure that all variables in the dataset are properly labelled. Labels don’t have to be long but should be meaningful to a newcomer. You can use the labelled package or a similar tool to add these labels.
Documentation
Document each dataset using the corresponding text from the book to maintain consistency and provide clear context.
Examples of Usage
Add examples of how to use the datasets to your code. These examples should be saved as separate files in the inst/examples directory.

Your contributions will help us expand this resource and make it even more valuable for the community. Thank you for your support!

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
R		R
data-raw		data-raw
data		data
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
HSDS.Rproj		HSDS.Rproj
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HSDS

Installation

Available data sets

Example

Contributing to the package

About

Releases

Packages

Languages

License

ABohynDOE/HSDS

Folders and files

Latest commit

History

Repository files navigation

HSDS

Installation

Available data sets

Example

Contributing to the package

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages