Chapter 3 Programming prerequisites

This chapter gives a quick overview of the prerequisite R skills needed to use this book. These are covered in the Introduction to Exploratory Data Analysis with R course. We assume you are comfortable using these skills, so you may need to spend revising them if you feel that you’re a little rusty.

3.1 Starting a learnr tutorial

You will be using learnr learnr tutorials to gain practical experience of what you are learning about learn each week. These interactive tutorials contain three main components:

1 - Static Text: This provides background information for revision purposes or instructions to do something.

2 - Code boxes: These are interactive boxes that allow you to execute R code and see the results.

3 - Quizzes: These are multiple-choice/multiple-answer questions designed to check your understanding.

You will be given a visual ‘walk-through’ of how to run a learnr tutorial in week one of the course. The first tutorial aims to be self-describing—it provides a stand-alone introduction to how to use the tutorials.

3.2 Using packages

R packages extend the basic functionality of R so that we can do more with it. In a nutshell, an R package bundles together R code, data, and documentation in a standardised way that is easy to use and share with other users. This book uses a subset of the tidyverse ecosystem of packages: the readr package for reading data into R, the dplyr package for data manipulation, and the ggplot2 package for making plots. We need to understand how R’s package system works to use these.

Here’s the key point: Installing a package, and then loading and attaching the package, are different operations. We only have to install a package once onto our computer, but we have to load and attach the package every time we want to use it in a new R session (i.e. every time we start RStudio). If that doesn’t make any sense, revise the package system chapter Exploratory Data Analysis in R book.

Installing a package can be done via the install.packages function, e.g. use this code to install the dplyr package:

install.packages("dplyr")

Alternatively, you can use RStudio’s menu interface via the packages tab in the bottom right window.

Either way is fine. However, the install.packages route should be carried by typing the install commands directly into the Console (this is pretty much the only time we work this way). Do not leave install.packages statements in your R scripts. We only have to install a package once onto our computer to make it available. Because installing packages can be slow, we’d rather not do that every time we have to run a script.

Loading and attaching a package so that it can actually be used happens via the library function, e.g.

library("dplyr")

We do usually leave library statements at the beginning of scripts to ensure that all the package functions we need are available to the rest of the script.

3.3 Reading data into R

Last year we made extensive use of ‘built in’ data sets that reside inside R. This meant we could use the data without getting bogged down trying to read it into R. We’ll carry on doing that at times as we work through the book and the accompanying learnr tutorials. However, we don’t have the luxury of this short cut when we work with our own data, so we’ll work towards adopting more realistic practices as we go.

In ‘real world’ data analysis, when we need to work with data, we typically save a copy of it into some kind of file on our computer and then read that file into R. The data sets we use in this course are stored as a Comma Separated Value (‘CSV’) text files. The base R read.csv or the read_csv function from the readr package can be used to read in such files.

If we use read.csv the resulting R data object is a data frame. If we use read_csv we end up with a ‘tibble’ which can be thought of as the tidyverse version of a data frame. Either is fine, the differences don’t matter in this book. A data frame is a table-like object that collects together different variables, storing each of them as a named column. We can access the data inside the data frame by referring to particular columns and rows, or manipulate the whole data frame with a package like dplyr.

If that last paragraph was confusing, it would be a good idea to work through the data frames chapter of the Exploratory Data Analysis in R book.