Chapter 3 Programming prerequisites

This chapter gives a quick overview of the prerequisite R skills needed to use this book (we studied these last year). We will use these skills this year, so you may need to spend revising them if you feel that you’re a little rusty.

The key R skills you need to have in place to use this book will be revised in the first two practical sessions. The lecture slides will be placed on MOLE after the practical.

Transfer students

The material in this chapter won’t make any sense at the moment if you are an Environmental Sciences student joining us from Geography, or a student transferring into APS from a different department. Don’t panic! You will have the opportunity to catch up in the first few weeks.

3.1 Starting an R session in RStudio

Here is the process you should go through every time you start a new R session:

  1. Open up RStudio and set your working directory. Do this via the RStudio menu system: Session ▶ Set Working Directory ▶ Choose Working Directory…. Be sure to choose a sensible location. The working directory is where R will look for data files and R scripts. It’s simplest to use the same working directory in every practical, but it isn’t necessary to do this.

  2. Open up a new R script using the RStudio menu system: File ▶ New File ▶ R Script. This will create (but not save) an empty, unnamed R script. Don’t create any other kind of file unless you already know how to use R Notebooks or R Markdown files.

  3. There are a few of things that regularly appear at the start of a script, e.g. we often start by clearing the workspace with rm and loading packages with library. Add these preamble chunk of code (and comments!) before doing anything else:

# clear the workspace so that we have a 'clean sheet'
rm(list = ls())

# load and attach the packages we want to use...
# 1. 'dplyr' for data manipulation
library(dplyr)
# 2. 'ggplot2' for plotting
library(ggplot2)
  1. Now run the preamble section of the script, i.e. highlight everything and hit Ctrl+Enter. If the library commands didn’t work the relevant package probably isn’t installed yet. Install the package (see below) and try rerunning the script.

  2. Look at the label of the tab the script lives in. This will probably be called something like Untitled1 and the label text will be red. This is RStudio signalling that the file has not yet been saved. So after the preamble part of the script is working, save the script in the working directory!

Now we’re ready to start developing the script.

3.2 Using packages

R packages extend the basic functionality of R so that we can do more with it. In a nutshell, an R package bundles together R code, data, and documentation in a standardised way that is easy to use and share with other users. This book uses a subset of the tidyverse ecosystem of packages: the dplyr package for data manipulation, and the ggplot2 package for making plots. We need to understand how R’s package system works to use these.

Here’s the key point: Installing a package, and then loading and attaching the package, are different operations. We only have to install a package once onto our computer, but we have to load and attach the package every time we want to use it in a new R session (i.e. every time we start RStudio). If that doesn’t make any sense, revise the package system chapter in the APS 135 course book.

Installing a package can be done via the install.packages function, e.g.

install.packages("dplyr")

There’s no need to leave install.packages statements in an R script. Loading and attaching a package so that it can actually be used happens via the library function, e.g.

library("dplyr")

We do often leave library statements in scripts.

3.3 Reading data into R

Last year we made extensive use of data sets that reside inside various R packages. This meant we could use the data without getting bogged down trying to read it into R. We don’t have the luxury of this short cut when we work with our own data, so we’ll adopt more realistic practices in this book. Whenever we need to work with a data set, we will have to first save a copy of it onto our computer, and then read it into R. All the data sets we use are stored as a Comma Separated Value (‘CSV’) text file. The read.csv function can be used to read in such files. The resulting data object in R is a data frame. A data frame is a table-like object that collects together different variables, storing each of them as a named column. We can access the data inside the data frame by referring to particular columns and rows, or manipulate the whole data frame with a package like dplyr.

If that last paragraph was confusing, it would be a good idea to work through the data frames chapter in the APS 135 course book.