Chapter 5 Other kinds of vectors
5.1 Introduction
The data we collect and analyse are often in the form of numbers, so it comes as no surprise that we work with numeric vectors a lot in R. Nonetheless, we also sometimes need to represent other kinds of vectors, either to represent different types of data, or to help us manipulate our data. This chapter introduces two new types of atomic vector to help us do this: character vectors and logical vectors.
5.2 Character vectors
The elements of character vectors are what are known as a “character string” (or “string” if we are feeling lazy). The term “character string” refers a sequence of characters, such as “Treatment 1”, “University of Sheffield”, “Population Density”. A character vector is an atomic vector that stores an ordered collection of one or more character strings.
If we want to construct a character vector in R we have to place double ("
) or single ('
) quotation marks around the characters. For example, we can print the name “Dylan” to the Console like this:
"Dylan"
## [1] "Dylan"
Notice the [1]
. This shows that what we just printed is an atomic vector of some kind. We know it’s a character vector because the output is printed with double quotes around the value. We often need to make simple character vectors containing only one value—for example, to set the values of arguments to a function.
The quotation marks are not optional—they tell R we want to treat whatever is inside them as a literal value. The quoting is important. If we try to do the same thing as above without the quotes we end up with an error:
Dylan
## Error in eval(expr, envir, enclos): object 'Dylan' not found
What happened? When the interpreter sees the word Dylan
without quotes it assumes that this must be the name of a variable, so it goes in search of it in the global environment. We haven’t made a variable called Dylan, so there is no way to evaluate the expression and R spits out an error to let us know there’s a problem.
Longer character vectors are typically constructed to represent data of some kind. The c
function is often a good starting point for this kind of thing:
# make a length-3 character vector
my_name <- c("Dylan", "Zachary", "Childs")
my_name
## [1] "Dylan" "Zachary" "Childs"
Here we made a length-3 character vector, with elements corresponding to a first name, middle name, and last name. If we want to extract one or more elements from a character vector by their position
Take note, this is not equivalent to the above :
my_name <- c("Dylan Zachary Childs")
my_name
## [1] "Dylan Zachary Childs"
The only element of this character vector is a single character string containing the first, middle and last name separated by spaces. We didn’t need to use the the c
function here because we were only ever working with a length-1 character vector. i.e. we could have typed "Dylan Zachary Childs"
and we would have ended up with exactly the same text printed at the Console.
We can construct more complicated, repeating character vectors with rep
. This works on character vectors in exactly the same way as it does on numeric vectors:
c_vec <- c("Dylan", "Zachary", "Childs")
rep(c_vec, each = 2, times = 3)
## [1] "Dylan" "Dylan" "Zachary" "Zachary" "Childs" "Childs" "Dylan"
## [8] "Dylan" "Zachary" "Zachary" "Childs" "Childs" "Dylan" "Dylan"
## [15] "Zachary" "Zachary" "Childs" "Childs"
Each element was replicated twice (each = 2
), and then the whole vector was replicated three times (times = 3
), end to end.
What about the seq
function? Hopefully it is fairly obvious that we can’t use this function to build a character vector. The seq
function is designed to make sequences of numbers, from a starting value, to another. The notion of a sequence of character strings – for example, from "Dylan"
to "Childs"
– is meaningless.
5.3 Logical vectors
The elements of logical vectors only take two values: TRUE
or FALSE
. Don’t let the simplicity of logical vectors fool you. They’re very useful. As with other kinds of atomic vectors the c
and rep
functions can be used to construct a logical vector:
l_vec <- c(TRUE, FALSE)
rep(l_vec, times = 2)
## [1] TRUE FALSE TRUE FALSE
Hopefully nothing about that output is surprising by this point.
So why are logical vectors useful? Their allow us to represent the results of questions such as, “is x greater than y” or “is x equal to y”. The results of such comparisons may then be used to carry out various kinds of subsetting operations.
Let’s first look at how we use logical vectors to evaluate comparisons. Before we can do that though we need to introduce relational operators. These sound fancy, but they are very simple: we use relational operators to evaluate the relative value of vector elements. Six are available in R:
x < y
: is x less than y?x > y
: is x greater than y?x <= y
: is x less than or equal to y?x >= y
: is x greater than or equal to y?x == y
: is x equal to y?x != y
: is x not equal to y?
The easiest way to understand how these work is to simply use them. We need a couple of numeric variables first:
x <- 11:20
y <- seq(3, 30, by = 3)
x
## [1] 11 12 13 14 15 16 17 18 19 20
y
## [1] 3 6 9 12 15 18 21 24 27 30
Now, if we need to evaluate and represent a question like, “is x greater than than y”, we can use either <
or >
:
x > y
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
The x > y
expression produces a logical vector, with TRUE
values associated with elements in x
are less than y
, and FALSE
otherwise. In this example, x is less than y until we reach the value of 15 in each sequence. Notice that relational operators are vectorised: they work on an element by element basis. They wouldn’t be much use if this were not the case.
What does the ==
operator do? It compares the elements of two vectors to determine if they are exactly equal:
x == y
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
The output of this comparison is true only for one element, the number 15, which is at the 5th position in both x
and y
. The !=
operator is essentially the opposite of ==
. It identifies cases where two elements are not exactly equal. We could step through each of the different the other relational operators, but hopefully they are self-explanatory at this point (if not, experiment with them).
=
and ==
are not the same
If we want to test for equivalence between the elements of two vectors we must use double equals (==
), not single equals (=
). Forgetting to do this ==
instead of =
is a very common source of mistakes. The =
symbol already has a use in R—assigning name-value pairs—so it can’t also be used to compare vectors because this would lead to ambiguity in our R scripts. Using =
when you meant to use ==
is a very common mistake. If you make it, this will lead to all kinds of difficult-to-comprehend problems with your scripts. Try to remember the difference!