When teaching stats to non-statisticians I am reluctant to use example data which is completely unrelated to the student’s field of interest. As my teaching is nearly always addressed to health care professionals (mainly doctors and nurses), I am always looking for clinical datasets. Unfortunately, example datasets frequently used in R tutorials (like iris, cars, etc.), while very handy, are of little interest to clinicians. That’s why I decided to look for clinical datasets included in R packages.
To build the collection of clinical datasets, I started from this previous collection of datasets (thanks to Vincent Arel-Bundock), and from an extremely useful script to find datasets available in all installed packages (thanks to Saghir Bashir). From these, I retained a dataset if the data:
- refers to a medical area/topic (or is otherwise familiar to meds), and
- is observed or measured at the patient (or person) level, and
- is provided as a dataframe
These resulted in a collection of 78 selected datasets, which are listed in the table below. Please, be aware that this collection is by no means exhaustive, since I did not explore all R packages (not even all those in CRAN!). However, I thought it might be of interest to those who, like me, teach stats with R to meds.