Chapter 24 Doing more with ggplot2

Throughout this block we have learnt a range of ways to plot our data using ggplot2. Here we provide a few more examples of ways you may wish to customise your plots. You will not be examined on the material in this chapter, however it will be helpful when you have to make plots for other modules, such as during your Level 1 projects.

We will use the storms data set again from the nasaweather package. As in the Relationships between two variables chapter we will reorder the levels of the type variable so that they get increasingly fierce.

# 1. make a vector of storm type names in the required order
storm_names <- c("Tropical Depression", "Extratropical", "Tropical Storm", "Hurricane")
# 2. now convert type to a factor
storms_alter <- 
  storms %>% 
  mutate(type = factor(type, levels = storm_names)) 

24.1 Adding error bars

In the Building in Complexity chapter we learnt how to make a bar chart showing the means of our data. However, we generally want to show how variable the data are as well as the central tendency (e.g. mean) of the data. To do this we can include error bars showing for example the standard deviation or standard error of the mean. We’ll demonstrate this using the storms data set again, by plotting the means and standard errors of wind speed for each storm type.

We start by calculating the means and standard deviations for each group.

storms_sum <- 
  storms_alter %>% 
  group_by(type) %>% 
  summarise(mean_wind = mean(wind), std = sd(wind))
storms_sum
## # A tibble: 4 x 3
##   type                mean_wind   std
##   <fct>                   <dbl> <dbl>
## 1 Tropical Depression      27.4  3.52
## 2 Extratropical            40.1 13.2 
## 3 Tropical Storm           47.3 11.1 
## 4 Hurricane                84.7 18.8

We can now use this data frame to make the plot, using geom_col to plot the means and the unsurprisingly named geom_errorbar to add the error bars.

ggplot(storms_sum, aes(x=type, y = mean_wind)) + 
  # Plot the means
  geom_col(fill = "orange") + 
  # Add the error bars
  geom_errorbar(aes(ymin = mean_wind - std, ymax = mean_wind + std), width = 0.1) + 
  # Flip the axes round to prevent labels overlapping
  coord_flip() + 
  # Use a more professional theme 
  theme_classic(base_size = 12) + 
  # Change the axes labels
  xlab("Storm Category") + ylab("Mean Wind Speed (mph)")

The ymin and ymax arguments of the geom_errorbar function give the lower and upper limits of error bars. Here, we have plotted the mean +/- 1 standard deviation. Note that we can change the width of the error bars using the width argument. Also remember if you’re including error bars on a plot that you MUST specify in the figure legend what they show (e.g. standard deviation, standard error of the mean, 95% confidence intervals).

24.2 Adding text to plots

There may be some cases in which you want to add text to plots, for example to show the sample size for each group or to show which categories are significantly different from each other if you’ve performed a statistical test (we’ll come back to this at Level 2).

Here we’re going to add a label for each bar on our bar chart. To do this we start by adding the labels that we want to use to the data frame. For example, here we will calculate the mean (using the mean function) and the sample size (using the function n) for each group.

storms_sum <- 
  storms %>% 
  group_by(type) %>% 
  summarise(mean_wind = mean(wind), samp = n())
storms_sum
## # A tibble: 4 x 3
##   type                mean_wind  samp
##   <chr>                   <dbl> <int>
## 1 Extratropical            40.1   412
## 2 Hurricane                84.7   896
## 3 Tropical Depression      27.4   513
## 4 Tropical Storm           47.3   926

Then we can add the text showing the sample size to our plot using the function geom_text.

ggplot(storms_sum, aes(x = type, y = mean_wind)) +
 # Add the bars
 geom_col(fill = "orange") +
 # Flip the axes round
 coord_flip() +
 # Change the axes labels
 xlab("Storm Category") + ylab("Mean Wind Speed (mph)") +
 # Add the text
 geom_text(aes(label = samp, y = 10)) + 
 # Use a more professional theme
 theme_classic(base_size = 12)

24.3 Customising text

Sometimes we may want to change the appearance of the text on the plot. For example, sometimes if the axis labels are quite long they may be bunched together or overlap each other, making it difficult to read them. We saw this before in the Exploring Categorical Variables chapter.

ggplot(storms_alter, aes(x = type)) + 
  geom_bar(fill = "orange", width = 0.7) + 
  xlab("Storm Type") + ylab("Number of Observations")

Here it is very difficult to read the categories on the \(x\) axis as the text is overlapping. In the Exploring Categorical Variables chapter we saw one way to deal with this, by using coord_flip to rotate the axes. An alternative to this is to change the size of text - a simple way to do this is to use the base_size argument within a ggplot theme_XX function as follows:

ggplot(storms, aes(x = type)) + 
  geom_bar(fill = "orange", width = 0.7) + 
  xlab("Storm Type") + ylab("Number of Observations") +
  theme_classic(base_size = 10)

The base_size argument changes the size of all of the text within the plot.

It is also possible to rotate the labels themselves rather than the whole plot. Here, we use the angle argument of the element_text function again inside theme.

ggplot(storms, aes(x = type)) + 
  geom_bar(fill = "orange", width = 0.7) + 
  xlab("Storm Type") + ylab("Number of Observations") +
  theme(axis.text.x = element_text(angle = 90))

Here we used the argument ‘axis.text.x’ so that only the labels on the \(x\) axis were rotated.

24.4 Saving plots

When using RStudio plots can be saved using the Export button. However, such plots are often pixelated. R also has a range of functions that can be used to save plots. When making figures with ggplot we can use the ggsave function.

For example, here we will create a scatter plot using the storms data set again.

ggplot(storms, aes(x = pressure, y = wind)) + 
  # Add the points
  geom_point() + 
  # Change the axis labels
  labs(x="Atmospheric pressure (mbar)", y = "Wind speed (mph)")

Once you’re happy with the plot you can use the ggsave function to save it as follows:

ggsave("Stormsplot.pdf", height = 5, width = 5)

The first argument that this function takes is the name of the file that you will save. By default ggsave will save the last plot that you’ve made. You can also provide the name of a plot as the second argument to the function if you have assigned it a name. Note that R will save the plot to your working directory (you can change where the plot is saved to using the path argument in the ggsave function). Note that if you do not specify the width and height arguments to ggsave it will use the current size of your plotting window.

You can also add the ggsave function on to the code for a specific plot

ggplot(storms, aes(x = pressure, y = wind)) + 
  # Add the points
  geom_point() + 
  # Change the axis labels
  labs(x="Atmospheric pressure (mbar)", y = "Wind speed (mph)") + 
  # Save the figure
  ggsave("Stormsplot.pdf", height = 5, width = 5)

24.5 Panel plots

We have already seen how the facet_wrap function can be used to produce multiple panels in the Introduction to ggplot2 chapter. This function can be used where you want to make multiple plots each showing a different level of a factor. However, sometimes you may wish to present a multi-panel plot using different variables in the different panels.

There are multiple ways to do this, we’re going to show you one using the cowplot package. First make sure that this package is installed (if you haven’t used it before) and loaded (every time you use it). Then make the individual plots that you want to include your multi-panel plot using ggplot as normal. For example we might want to look at a) the relative frequency of different storm types occuring and b) the mean wind speed associated with each storm type. First we make these two plots that we want to include in the panel and assign these to names.

plta <- ggplot(storms, aes(x = type)) + 
  geom_bar(fill = "orange") + 
  xlab("Storm Type") + ylab("Number of Observations") +
  theme_classic(base_size = 10) 

pltb <- ggplot(storms_sum, aes(x=type, y = mean_wind)) + 
  # Plot the means
  geom_col(fill = "orange") + 
  # Change the axes labels
  xlab("Storm Category") + ylab("Mean Wind Speed (mph)") +
  theme_classic(base_size = 10)

Then we can use the plot_grid function from the cowplot package to create the multi-panel plot.

plot_grid(plta, pltb, nrow = 1, labels = c("auto"), label_size = 10)

The plot_grid function allows the panel to be customised easily, for example by changing the number of plots in each row (nrow argument) and including labels for each panel (labels argument).

We can then use the ggsave function to save our multi-panel plot as before.

# Create the multi-panel plot
plot_grid(plta, pltb, nrow = 1, labels = c("auto"), label_size = 10) + 
  # Save it
  ggsave("Stormsplot.pdf", height = 4, width = 8)