4 Atomic Vectors
While everything in R is an object, the vast majority of the objects are vectors. Vectors are often called the building blocks of R objects. This chapter will focus on the homogeneous version of vectors, atomic vectors. Next chapter we will introduce heterogeneous vectors, that is generic vectors, often called lists.1 Because atomic vectors are homogeneous, we will need to introduce and discuss object types.
After reading this chapter you should be able to:
- Determine an object’s type.
- Create vectors of a particular type.
- Determine the length of a vector.
- Use a vector’s index to extract a single element.
- Use vectorized operations.
4.1 Object Types
Every object in R has a type. To determine an object’s type, you can use the typeof()
function.
typeof("Hello, World!")
#> [1] "character"
typeof(4.2)
#> [1] "double"
typeof(42L)
#> [1] "integer"
typeof(TRUE)
#> [1] "logical"
typeof(2 + 3i)
#> [1] "complex"
typeof(raw(1))
#> [1] "raw"
The above demonstrates the six types of atomic vectors.
- Character
- Double
- Integer
- Logical
- Complex
- Raw
We will largely ignore complex and raw, as you will rarely encounter them.2 We will focus on character, double, integer, and logical vectors.
We’ve already seen how to create larger (longer) vectors through the use of the c()
function. For example:
c(4.2, 6.1, 1.3)
#> [1] 4.2 6.1 1.3
This is a double vector. Additionally, each element of this (atomic)3 vector, 4.2
, 6.1
, and 1.3
, are also double vectors. This is because R does not have a notion of a scalar. Instead, they are length one vectors.
4.1.1 Character Vectors
Character vectors are used to store text strings.4
To create character vectors, you can use either single ('
) or double ("
) quotation marks, so long as the opening and closing quotation marks match.
typeof("Hello, World!")
#> [1] "character"
typeof('STAT 385')
#> [1] "character"
The quotation marks are needed, because without them, R will assume you are trying to reference an object by name.
"foo" # a character vector, runs
#> [1] "foo"
# trying to recall an object via the name foo foo
#> Error in eval(expr, envir, enclos): object 'foo' not found
As excepted, we can use the c()
function to store larger (longer) character vectors. Note that this book, and RStudio, will often provide some syntax highlighting that helps understand the following code. In this case, the strings, including their quotation marks, are a different color than the rest of the code used to create the object.
c("This", "is", "a", "long", "character", "vector.")
#> [1] "This" "is" "a" "long" "character" "vector."
Like any object, we can assign them names.
= "Hello, World!"
foo = "STAT 385"
bar = c("This", "is", "a", "long", "character", "vector.") baz
We can then use those names (or the code to create the objects) to verify the types of the objects that the names are assigned to. Remember, we are not technically checking the type of foo
5, we are checking the type of the object that currently has the name foo
assigned to it. We’ll eventually relax and not be so pedantic, but it’s an important distinction.
c(typeof(foo), typeof(bar), typeof(baz))
#> [1] "character" "character" "character"
Here we’re using the c()
function to combine the output of checking the type of each of the vectors, which itself returns a character vector! Very meta! How do we know this is a character vector? Well, we can of course simply check.
typeof(c(typeof(foo), typeof(bar), typeof(baz)))
#> [1] "character"
However, eventually you’ll become familiar with some of the context clues that R leaves behind when it prints objects. In this case, the quotation marks, "
, are the clue.6
What if you need to include a quotation mark in a string? You have two options:
- Use the type of quotation mark that does not match the type of quotation mark used to create the string.
- Use the escape character
\
.
"cat'dog"
#> [1] "cat'dog"
'cat"dog'
#> [1] "cat\"dog"
"cat\"dog"
#> [1] "cat\"dog"
Because R uses the "
symbol to print strings, if you inserted a "
into the string, it will need to display the escape character when printing. To see the string rendered without the escape character (and without the quotation marks used as syntax to define the character vector), use cat()
.
cat("cat\"dog")
#> cat"dog
For additional details and documentation on character vectors, use:
?character
The is.character()
function checks if a vector is character typed and returns the logical value TRUE
or FALSE
accordingly.
is.character("cat")
#> [1] TRUE
is.character(42)
#> [1] FALSE
4.1.2 Double Vectors
Most numbers you encounter in R will be in the form of double vectors. A double vector stores floating point values.7
If you’re interested in some of the details of how R performs floating point arithmetic on your machine, use:
?.Machine
For our purposes, you can mostly just think of double vectors as numbers and ignore these details.
typeof(4.2)
#> [1] "double"
typeof(1.23e-10)
#> [1] "double"
typeof(6)
#> [1] "double"
typeof(pi)
#> [1] "double"
Notice that for each of the above, the type is double. You might think that 6
is integer typed, but again, when you type a number, it is almost always a double. More on integers shortly.
Numbers are doubles8. Doubles are numbers.
Three special double values that you may encounter are Inf
, -Inf
, and NaN
for infinity, negative infinity, and not-a-number respectively.
1 / 0
#> [1] Inf
-42 / 0
#> [1] -Inf
0 / 0
#> [1] NaN
For details on each, use:
Inf ?
It is often useful to create sequences of numbers. To do so, the seq()
function is extremely useful. The seq()
function generally uses three arguments:
from
, the starting value of the sequence.to
the upper limit of the sequence. This may or may not be the last value in the sequence.by
, how to increment between values of the sequence.
The function returns a (usually double) vector containing the elements of the sequence defined.
seq(from = 0, to = 1, by = 0.1)
#> [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Alternatively, you can use the length.out
argument instead of by
to specify the length of the output, and the increment will be calculated automatically.
seq(from = 0, to = 1, length.out = 21)
#> [1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
#> [16] 0.75 0.80 0.85 0.90 0.95 1.00
For additional details of the seq()
function, use:
?seq
For additional details and documentation on double vectors, use:
?double
The is.double()
function checks if a vector is double typed and returns the logical value TRUE
or FALSE
accordingly.
is.double(42)
#> [1] TRUE
is.double("cat")
#> [1] FALSE
4.1.3 Integer Vectors
Sometimes, a number in R is an integer9 vector. There are two ways you are likely to encounter them:
- By adding the suffix
L
10 to the end of a number. For example:42L
. - Using the
:
operator to create integer sequences.
While humans would recognize 42 as an integer, in R, simply typing 42
will produce a double.
typeof(42)
#> [1] "double"
To indicate that you would like 42 stored as an integer, use 42L
.
typeof(42L)
#> [1] "integer"
It’s rare that you truly need to do this. However, by chance, you will often create integer vectors when using the :
operator. The :
operator can be used to quickly create sequences. While it does not necessarily create integer sequences, if the resulting vector can be properly represented by integers, it will return integers.
1:10
#> [1] 1 2 3 4 5 6 7 8 9 10
typeof(1:10)
#> [1] "integer"
However, it can also be used to create obviously non-integer sequences.
1.1:20
#> [1] 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1 11.1 12.1 13.1 14.1 15.1
#> [16] 16.1 17.1 18.1 19.1
typeof(1.1:20)
#> [1] "double"
For additional details, use
`:` ?
While R does make the distinction between integer and double, often, R users who care about higher level abstractions involved in data analysis do not. As such, in addition to a type, all objects have a mode. We won’t dig into the details of mode, but you should be aware that integer and double vectors share the same mode: numeric.
Like, the typeof()
function, the mode()
function determines the mode of an object.
mode(42) # double type
#> [1] "numeric"
mode(42L) # integer type
#> [1] "numeric"
For additional details and documentation on integer vectors, use:
?integer
For additional details and documentation on numeric vectors, use:
?numeric
For additional details and documentation about mode, use:
?mode
The is.integer()
function checks if a vector is integer typed and returns the logical value TRUE
or FALSE
accordingly.
is.integer(42)
#> [1] FALSE
is.integer(42L)
#> [1] TRUE
The is.numeric()
function checks if a vector has mode numeric and returns the logical value TRUE
or FALSE
accordingly. It will return TRUE
for both integer and double typed vectors.
is.numeric(42)
#> [1] TRUE
is.numeric(42L)
#> [1] TRUE
is.numeric("foo")
#> [1] FALSE
4.1.4 Logical Vectors
As the name suggests, logical vectors store logical values. There are two logical values, TRUE
and FALSE
.
typeof(TRUE)
#> [1] "logical"
typeof(FALSE)
#> [1] "logical"
Logical vectors will be important later for subsetting and other programming tasks.
Technically, the NA
value is also a logical vector.
typeof(NA)
#> [1] "logical"
However, this requires additional explanation that we will defer until after we have introduced type coercion.
Note that T
and F
can be used as shortcuts to TRUE
and FALSE
. That is, they are names that are by default assigned to TRUE
and FALSE
.
T
#> [1] TRUE
F
#> [1] FALSE
However, we recommend you not use them. While TRUE
and FALSE
are reserved words, T
and F
are not. Reserved words are words (names) that reference a particular object, and cannot be used to refer to other objects. That is, you cannot reassign these names. To see a list of reserved words, use:
?Reserved
To demonstrate, attempting to use TRUE
during assignment results in an error.
TRUE = 42
#> Error in TRUE = 42: invalid (do_set) left-hand side to assignment
You’re free to do evil things with T
and F
. For example:
= FALSE
T = TRUE
F c(T, F)
#> [1] FALSE TRUE
You’re been warned.
For additional details and documentation on logical vectors, use:
?logical
The is.logical()
function checks if a vector is logical typed and returns the logical value TRUE
or FALSE
accordingly.
is.logical(TRUE)
#> [1] TRUE
is.logical(42)
#> [1] FALSE
4.2 Length and Indexing
Consider the following vectors:
= seq(from = 2, to = 100, by = 2)
evens = c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29) primes
The length of a vector is the number of elements of the vector. You can determine the length of a vector with the length()
function.
length(evens)
#> [1] 50
length(primes)
#> [1] 10
The vectors assigned names evens
and primes
have lengths 50 and 10 respectively.
Let’s print the evens
vector.
evens
#> [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
#> [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
#> [39] 78 80 82 84 86 88 90 92 94 96 98 100
You should notice two things. First, it displays each element of the vector, as you would expect. However, there is also some additional information printed along the left hand side. In the case of these notes, we see [1]
, [18]
, and [35]
.11 What is this information?
These numbers, [1]
, [18]
, and [35]
, are specific indexes of the vector. That is, each element of a vector has an index which is its position in the vector. R is a 1-indexed language. That is, the first element of the vector has index 1.12
This is why you see [1]
in the console so often. Since the vast majority of vectors have at least one element, and R prints the index of first element displayed on each line when printing a vector, you almost always see [1]
. Note that, while it appears that a vector is being printed in rows (lines), a vector is a 1-dimensional, flat object. This is simply a printing side-effect to make the results more human readable.
While we will discuss subsetting vectors and other objects in great detail later, for now we’ll note that the [[
operator can be used to access specific elements of a vector by supplying its index.13
20]] # access the 20th element of evens evens[[
#> [1] 40
39]] # access the 39th element of evens evens[[
#> [1] 78
4]] # access the 4th element of primes primes[[
#> [1] 7
length(primes)]] # access the last element of primes primes[[
#> [1] 29
Vectors of a specific type and arbitrary length can be created with the vector()
function.
vector(mode = "logical", length = 3)
#> [1] FALSE FALSE FALSE
vector(mode = "integer", length = 4)
#> [1] 0 0 0 0
vector(mode = "double", length = 5)
#> [1] 0 0 0 0 0
vector(mode = "character", length = 6)
#> [1] "" "" "" "" "" ""
Notice, that each of the above contains elements that are similar to 0
for its specific type, so FALSE
for logical and an empty string for character.
All vector types can have length zero. When using the vector function, if you do not specify a length, or directly specify 0
, it creates a length zero vector of that type.
vector(mode = "logical")
#> logical(0)
vector(mode = "integer")
#> integer(0)
vector(mode = "double")
#> numeric(0)
vector(mode = "character")
#> character(0)
You will eventually encounter vectors that look like this14 when something goes wrong with your code. As such, it is helpful to understand what they represent, and especially what their type is.
There are also four shortcut functions related to these with more specific functionality: logical()
, integer()
, double()
, and character()
.
logical(length = 10)
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
integer()
#> integer(0)
double(length = 0)
#> numeric(0)
character(length = 5)
#> [1] "" "" "" "" ""
4.3 NULL
Somewhat related to the concept of a zero length vector is the NULL
value. The NULL
value represents nothing, like the empty set in mathematics. It has type NULL
and no length.
typeof(NULL)
#> [1] "NULL"
length(NULL)
#> [1] 0
When used together with the c()
function to combine vectors, it is more or less ignored.
c(1, NULL, 42)
#> [1] 1 42
When c()
is used with no arguments, it produces the NULL
value.
c()
#> NULL
For additional details, use:
NULL ?
4.4 Attributes
Objects in R can have attributes. They are not part of the object, and often won’t be displayed when you print the object. You can generally think of attributes as metadata. The attributes()
function can be used to both modify and access the attributes of an object. So far, none of the objects that we’ve created have had attributes.
attributes(1:10)
#> NULL
An object without attributes will return NULL
.
= c(1, 2, 3)
a_vector attributes(a_vector) = list(foo = "bar")
attributes(a_vector)
#> $foo
#> [1] "bar"
The above shows how to set an arbitrary attribute. Doing so requires using a list, which we haven’t discussed yet.
Some attributes are special and have reserved meaning. One of these attributes, class, we will return to in great detail later when we discuss the S3 class system for object oriented programming.
For now, we’ll discuss the names attribute.
In the case of atomic vectors, the names attribute can be used to assign a name to each of the elements of the vector. This can be done with the syntax above, but because it is such a common operation, shortcuts are available and we suggest you use them.
Names can be quickly added to an atomic vector in two ways:
- Using the
names()
function. - By supplying named arguments to the
c()
function when creating a vector.
= c(1, 2, 3)
vec_1 names(vec_1) = c("one", "two", "three")
Here we’ve created a vector, then added a name to each element. Notice that we specified the names using a character vector of the same length as the vector they are being added to.
When we print this vector now, the names will display above each element of the vector. We also no longer see the [1]
that we have become accustomed to seeing.
vec_1
#> one two three
#> 1 2 3
Remember though, these names are not part of the vector, they are simply metadata. If we perform an operation with this vector, the values will change but the names will remain.
* 2 vec_1
#> one two three
#> 2 4 6
If we want to check the names of a vector, use the names()
function.
names(vec_1)
#> [1] "one" "two" "three"
Now let’s see an example of naming the elements of a vector as it is created.15
= c(uno = 1, dos = 2, tres = 3) vec_2
Again, by doing so, when we print this vector, it will display the names above each element.
vec_2
#> uno dos tres
#> 1 2 3
names(vec_2)
#> [1] "uno" "dos" "tres"
When an atomic vector has names, in addition to accessing individual elements via their index, you can also access them by name.
"dos"]] vec_2[[
#> [1] 2
For reasons that will become clear after we discuss lists and subsetting, you can also perform a related operation with a single bracket, [
.
"dos"] vec_2[
#> dos
#> 2
Notice a minor difference in output. With a single bracket the name is retained, while with the double bracket it is not.16
Be careful when using this feature, as names are not required to be unique!
= c(z = 1, z = 2, z = 3)
vec_3 "z"]] vec_3[[
#> [1] 1
You’ve been warned!
4.5 Summary
- TODO: You learned a lot in this chapter…
- TODO: these are fundamental / core ideas that you will likely want to return to often…
4.6 What’s Next?
- TODO: coercion
- TODO: vectorization
Often, when R users refer to vectors, they implicitly mean atomic vectors, but as beginners, you should be very aware of this distinction and not make too many assumptions.↩︎
If you need to use complex vectors, after understanding doubles, they will be mostly self-explanatory.↩︎
At some point, we’ll stop stating this and you’ll need to understand it from context.↩︎
This is unfortunately confusing because in everyday language we use character to refer to a single character of text.↩︎
What kind of name is
foo
? Well, honestly just a placeholder because coming up with useful names quickly is hard.↩︎The output here is of course also a character vector, but we’ll stop before we get trapped in an infinite loop.↩︎
Remember, we need floating point arithmetic do deal with real numbers which might have infinite precision while using a computer with finite memory. For our purposes, we’ll largely ignore the details of how this works, and R will just deal with it for us. If you’re interested in understanding more about floating point arithmetic, consider reading “What Every Programmer Should Know About Floating-Point Arithmetic”.↩︎
Why is it called double? Because it uses double precision.↩︎
Why bother to create this distinction at all? Why not simply only use double vectors? Long story short: floating-point arithmetic. Noticing a pattern yet?↩︎
Why
L
? Because R uses long integers. The R language definition is actually silent on this etymology.↩︎These results depend on the size of the window the printing takes place it, so your results may vary. Apologies if the indexes are even a bit off in this textbook. It’s hard to predict where the cutoffs will be.↩︎
Many other languages are 0-indexed. While computer scientist have good reason to believe 0-indexing is superior, when designing a language for statistical computing, which relies heavily on linear algebra, 1-indexing is a natural choice. Have you ever heard of the 0th row of a matrix?↩︎
If you’re already familiar with R, you may have expected
[
, instead of[[
. We’ll explain this odd choice when we discuss subsetting in general. The distinction is important, but difficult to explain at this moment.↩︎Perhaps most often
character(0)
.↩︎This is an early example of what’s called non-standard evaluation, NSE. Technically, the proper code would be
c("uno" = 1, "dos" = 2, "tres" = 3)
, but thec()
function allows us to specify the names unquoted. NSE is one of the trickier bits of R, that we’ll mostly try to ignore. We’ll occasionally point it out where not noting it could cause confusion.↩︎Later, we’ll note this is the difference between a preserving or simplifying subset.↩︎