3  Objects & Functions

This chapter will introduce the concept of an object in R. It will also serve as an introduction to functions, which, are also objects. This chapter will give a broad overview of these two ideas. Subsequent chapters will reveal their details. The following quotation should highlight just how important these ideas are.

To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

John Chambers

After reading this chapter you should be able to:

3.1 Creating Objects

In the previous chapter, you created a number of objects, because everything in R is an object, so of course they were objects.1 In this chapter, we’re going to mostly ignore the details of objects, and instead give you a general sense of what they are and can be used for. In the most general sense, objects are data stored in memory that R can access.

Although it is a massive oversimplification, you can broadly group objects into two categories:

  • Objects that store data such as numbers, strings, and logical values.
  • Objects that store code, which for the most part you can think of as functions.2

Let’s create some objects to demonstrate:

42 # a number
#> [1] 42
"STAT 385" # a string
#> [1] "STAT 385"
TRUE # a logical value
#> [1] TRUE

Running the above code, you will see R print output in the console. Note that the output you see in the console is not the object itself, but the result of R printing information about the object for you. This is a very technical distinction that we will return to several times including when we discuss data types and structures, as well as the S3 system. The objects themselves exist only in your computer’s memory.

The trouble is, we made those objects, but now have no way to return to them. They may still be in memory, but they are inaccessible to us.3 We could recreate them, and they would output to the console again, but those would technically be different objects as they would exist at a different location in memory. We need a way to create persistent access to objects.

3.1.1 Names and Assignment

What we need now is a variable, which is a way to associate a name with an object. Often you will hear language such as “store the object 42 in the variable x” but this is rather misleading. The variable does not “contain” or “store” the object 42. That variable’s name “points” to the object in memory and is more or less a human shortcut for accessing that memory.4

Associating a name to an object is called assignment. There are multiple ways to do this in R, but the most common is to use either <- or = with the name on the left-hand side (LHS) and the object on the right-hand side (RHS). These two operators, <- and =, are referred to as assignment operators.

x = 42

As a result of this assignment, two things have happened:

  • We have created the object that stores the value 42 which now exists in memory.
  • We have associated the name x with this object.5

Now, to access the object 42 we can use the name we have given it:

x
#> [1] 42

A brief note about the difference between = and <-. Both can be used for assignment. That is, the following would have the same effect as the above code:

x <- 42

We demonstrate both because the vast majority of R code that you see will use <-. However, there is a small but seemingly growing group of users who prefer =. There is a very slight technical difference, but you will not encounter it in this book, and it is possible you never encounter it in practice. The most important thing is that you pick one and are consistent. We recommend = as it is what you will see throughout this book, it is much easier to type, and it will be less confusing or frustrating for those coming from other languages.

There are only two hard things in Computer Science: cache invalidation and naming things. — Phil Karlton

Giving your objects names is a bit of an art. In notes like these, you’ll often see throw-away variable names like x, y, and z. In practice you’ll see verbose names like a_long_variable_name_describing_the_object. The general heuristic we’ll suggest is: Pick the smallest name possible such that someone reading the code will have a good chance of understanding what the object is, given its surrounding context. Obviously, there is a lot of room for subjectivity here. We’ll return to this later.

When creating names in R, there a few things you should be aware of:

  • Like everything in R, names are case sensitive.
  • From the R documentation that can be accessed with ?make.names: “A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.” For reasons we’ll see when we discuss S3, we suggest you avoid dots, ..
  • You cannot use some reserved names. For a list of these, use ?Reserved.

3.1.2 Global Enviroment

Let’s make a few more objects:

x = 42
y = "STAT 385"
z = TRUE

After running this code, you might notice that there is no output in the console. So how do you know that those objects were created? You can check via some additional R code, or RStudio will also provide some support here.

objects()
#> [1] "x" "y" "z"

The objects() function will return the names of all objects in the global environment. R places different objects in different environments, but for now, all the objects you create will be in the global environment.

The ls() function, which you might have used previously in a Unix type terminal, also lists objects in the global environment.

ls()
#> [1] "x" "y" "z"

RStudio’s Environment tab, by default in the top-right, will also provide information about objects in the global environment, as they are created. This tool can also be used to inspect objects in other environments.

3.1.3 Order of Evaluation

Be aware that the order that you run your code in is important. The following two examples demonstrate.

p = 1
q = 2
p = q
p
#> [1] 2
q = 2
p = q
p = 1
p
#> [1] 1

This is something that we will be vigilant about throughout these notes, and comment on often. Always remember this: When running a line of code, the result of running that code is a function of both the line of code itself, and the current state of the environment! The same line of code may have two different outcomes given different states of the environment.

3.1.4 The c() Function

There are many, many ways to create objects in R. By far, the most common is by using the c() function. This function is used to combine6 values into a vector, the most important data structure in R. We will give a detailed description of vectors (both atomic and generic) in the coming chapters, but for now, let’s simply demonstrate the ability to combine values and objects together.

c(3, 2, 1)
#> [1] 3 2 1

Here, we have combined the values 3, 2, and 1 into a vector. We use commas, ,, inside the function call to separate the individual values we are combining. We could do similar operations with strings and logical values:

c("a string", "another string", "one more string")
#> [1] "a string"        "another string"  "one more string"
c(TRUE, FALSE, NA, FALSE, TRUE)
#> [1]  TRUE FALSE    NA FALSE  TRUE

For reasons that will be clear later, (atomic) vectors cannot mix numbers, strings, and logical values.

Now let’s create some (numeric) vectors, and assign them names.

odd = c(1, 3, 5, 7)
even = c(2, 4, 6, 8)
big = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

We can access these vectors (objects) by a name:

odd
#> [1] 1 3 5 7
even
#> [1] 2 4 6 8
big
#>  [1]  1  2  3  4  5  6  7  8  9 10

In addition to combining values, the c() function can actually combine multiple vectors together:

c(42, odd, even, 42)
#>  [1] 42  1  3  5  7  2  4  6  8 42

Notice that it appears we are mixing both “numbers” and vectors. Actually, that is not true. We are simply combining together a number of vectors. This is because in R, there is no such thing as a scalar. Something like 42 is actually a vector of length 1. Much more on this next chapter.

3.2 Writing and Using Functions

You’ve already used a number of functions in R like sqrt() and log(). But before we dive further into some details of using functions, it will probably be useful to write a few of our own functions. To do so, we will need to use the function() function.7

3.2.1 A Simple Function

Let’s start with the simplest function we can possibly write.

f = function() {
  
}

By running this code, we have created a function and assigned it a name f.8 It has no inputs. Inputs, which we will call the arguments of the function are specified inside the () of the function() function. As we have left this blank, our function has no arguments, that is, the function has no way to obtain input. What does this function do? A function’s body, is contained within the braces, {}, and specifies what code is run when we use the function.9 So in this case, the function does nothing.

How exciting!

Next, let’s talk about the difference between using f and f().

f
#> function() {
#>   
#> }

Evaluating f, without parentheses, will show you the entire function definition which is the arguments and body together. A function in R is made up of three components, its arguments (sometimes called formal arguments or formals), its body, and its environment. For now, you will only need to understand and use the arguments and body.10

f()
#> NULL

Evaluating f(), with parentheses, will “run” the function. That is, it will process the arguments (inputs) together with the code in the body, to produce output. Because this function does nothing, it returned NULL, a specific object we will discuss later.

Additional functions exist to extract and view specific parts of a function, for example, body(), args(), and formals().

3.2.2 A Simple But Useful Function

Now let’s write a function that has both input and output.

calc_rect_perim = function(length, height) {
  (2 * length) + (2 * height)
}

Here, we have created a function to calculate the perimeter of a rectangle. In has two arguments: length and height. These are the inputs of the function. The body of the function calculates the perimeter for particular values of length and height.

Let’s demonstrate running this function. To do so, be sure you have run the code above, and that the name calc_rect_perim appears in your global environment. In RStudio, it will be in a section called Functions. How very helpful!

calc_rect_perim(length = 3, height = 4)
#> [1] 14

The following video demonstrates typing the above code very quickly. The trick is to hit the [tab] key, ⇆, often.

A video demonstration of using [tab] key, ⇆, to autocomplete function and argument names.

Running the above, we have calculated the perimeter of a rectangle with length 3 and height 4. Essentially, the following code was evaluated11:

(2 * 3) + (2 * 4)
#> [1] 14

Notice that length and height never made it into your global environment. They only existed temporarily as variables inside the function. More on this later when we talk about scoping rules. Think of length and height as names assigned to temporary objects each time you run the function.

What happens if we try to run this function without specifying the inputs, that is, values (objects) supplied to the arguments.

calc_rect_perim()
#> Error in calc_rect_perim(): argument "length" is missing, with no default

We get an error because when we wrote the function, we added an input length but now did not supply a value for that argument. As such, R won’t know how to evaluate the body of the function because length won’t have a value.

When writing R functions, you can specify a default value of an argument. Let’s re-write this function:

calc_rect_perim = function(length = 1, height = 1) {
  (2 * length) + (2 * height)
}

This new version of the function specifies a default value of 1 for both arguments. Let’s try to run the function again without specifying any input.

calc_rect_perim()
#> [1] 4

This works! R falls back to any default value if a specific value for an argument isn’t given when running the function. Now that we have defaults, we could specify one input but not the other as well:

calc_rect_perim(length = 2)
#> [1] 6

Let’s show you another way of specifying inputs to a function. We do not recommend using this approach as beginners.

calc_rect_perim(3, 7)
#> [1] 20

The above is the same as the following:

calc_rect_perim(length = 3, height = 7)
#> [1] 20

Here we are using what is called positional argument matching. For the most part we do not recommend doing this for any argument other than the first argument to a function. Not naming the first argument is somewhat common practice, especially for the numerous functions that use x or data for the first argument. We’ll return to this idea when we talk about style. For now, either name all of your arguments when running a functions, or all except the first.

There exists another method called partial matching, but pretend we never told you that.12

3.2.3 Function Returns

Let’s write another function.

calc_powers = function(x) {
  zero = x ^ 0
  one = x ^ 1
  two = x ^ 2
  three = x ^ 3
}

Load this function into your environment by running the code above, then run the function:

calc_powers(x = 5)

Notice that nothing happens! Well, technically, something happened, and the function actually returned something. But, to make this function more useful, we need to be more careful to specify what the function should return, that is, we need to specify the output of the function.

By convention, R will return the last expression it evaluates, which is often the last line of code in the body of a function. Because there was an assignment on the last line, R returns that object, but invisibly. This sometimes useful feature is a bit confusing for beginners. To better understand why this is, type and run 42 then x = 42 in the console. Note that one prints output while the other does not.

Let’s make an edit:

calc_powers = function(x) {
  zero = x ^ 0
  one = x ^ 1
  two = x ^ 2
  three = x ^ 3
  result = c(zero, one, two, three)
}
calc_powers(x = 5)

This will now return something more useful, but it will still do so invisibly.

calc_powers = function(x) {
  zero = x ^ 0
  one = x ^ 1
  two = x ^ 2
  three = x ^ 3
  result = c(zero, one, two, three)
  result
}
calc_powers(x = 5)
#> [1]   1   5  25 125

Now, this is what we wanted. Because the last expression in the body of the function is not an assignment, R also prints the output!

calc_powers = function(x) {
  zero = x ^ 0
  one = x ^ 1
  two = x ^ 2
  three = x ^ 3
  c(zero, one, two, three)
}
calc_powers(x = 5)
#> [1]   1   5  25 125

Here we see that we could have simply skipped assigning the name result to the object we wanted to return. This is the most common practice when writing R functions. Later, there will be a need to potentially exit a function early, thus there is a return() function. We could instead write the following:

calc_powers = function(x) {
  zero = x ^ 0
  one = x ^ 1
  two = x ^ 2
  three = x ^ 3
  result = c(zero, one, two, three)
  return(result)
}

While many R programmers do not write their functions this way, we recommend it for beginners as it makes it abundantly clear what the output of the function is. Yes, there is an extra assignment (assigning the name result) and an added return statement. But we believe this will bring clarity as you practice writing your first functions.

Let’s run this function again. Be sure to load the most recent edit we made.

calc_powers(x = 5)
#> [1]   1   5  25 125

It returns a vector, which of course is an object. Notice that the inputs to our functions were also objects, in this case the object 5 which was temporarily assigned the name x inside the function.13

Since the inputs to functions are objects and the outputs of functions are objects, we can run functions on functions. For example:

mean(calc_powers(x = 5))
#> [1] 39

This is called function composition which you have seen expressed mathematically with expression like the following.

\[ f(g(x)) \]

In our case the output of running calc_powers(x = 5) was supplied as the input to mean(). This idea is extremely powerful, but can sometimes make it difficult to write readable code. Two strategies exist to assist: intermediate variables and piping. We’ll return to these later.

3.3 Vectorization

When discussing R code, you will often hear of vectorized code, or vectorization. We make some very brief comments in this section, but will return to this idea when we discuss vectors and functions in more detail later.

Welcome to R Club.

  • The first rule of R Club is: Do not use for loops!
  • The second rule of R Club is: Do not use for loops!
  • And the third and final rule: If you have to use a for loop, do not grow vectors!14

— Unknown

This fictitious quotation is a bit over-the-top, and there isn’t actually anything wrong with for loops in R15, but it should serve to get the reader’s attention. Computations that might require a for loop in other languages can often be written without a for loop in R. Furthermore, by avoiding a for loop in R, your code will likely be easier to write, easier for other programmers to understand, and possibly run faster.

# don't do this
x = c(6, 4, 3, 6, 7, 8, 9, 10)
y = 0
for (i in x) {
  y = y + i
}
y
#> [1] 53
# instead, do this
x = c(6, 4, 3, 6, 7, 8, 9, 10)
sum(x)
#> [1] 53
# don't do this
x = c(6, 4, 3, 6, 7, 8, 9, 10)
y = c()
for (i in x) {
  y = c(y, i + 1)
}
y
#> [1]  7  5  4  7  8  9 10 11
# instead, do this
x = c(6, 4, 3, 6, 7, 8, 9, 10)
x + 1
#> [1]  7  5  4  7  8  9 10 11

For now, simply pretend that you’ve never heard of or seen a for loop. Also, the loops written above are purposefully written poorly! Do not use these as example for loops.

We’ll dive further into vectorized code later, but for now, know that there are many functions in R that take as input a vector and output some function applied to the vector as a whole. Some examples:

# a vector we will perform operations on
x = c(5, 1, 3, 5, 13, 7, 9)
x
#> [1]  5  1  3  5 13  7  9
length(x) # length of the x vector
#> [1] 7
sum(x) # sum over x
#> [1] 43
prod(x) # product over x
#> [1] 61425

In more mathematical notation, the above would be:

\[ \texttt{sum(x)} = \sum_{i = 1}^{n} x_i \]

\[ \texttt{prod(x)} = \prod_{i = 1}^{n} x_i \]

Written this way, it really appears as if a for loop would be useful, but again, notice how much easier sum(x) is to read and write.

min(x) # find the minimum of x
#> [1] 1
max(x) # find the maximum of x
#> [1] 13
mean(x) # sample mean of x
#> [1] 6.142857
var(x)  # sample variance of x
#> [1] 15.80952
sd(x)   # sample standard deviation of x
#> [1] 3.976119

The three previous examples compute descriptive statistics. They are functions of samples. They are not functions of distributions.16

Mathematically, these are:

\[ \texttt{mean(x)} = \frac{1}{n}\sum_{i = 1}^{n} x_i \]

\[ \texttt{var(x)} = \frac{1}{n - 1}\sum_{i = 1}^{n} (x_i - \bar{x}) ^ 2 \]

\[ \texttt{sd(x)} = \sqrt{\frac{1}{n - 1}\sum_{i = 1}^{n} (x_i - \bar{x}) ^ 2} \]

Here, \(n\) is the length of the vector and \(\bar{x}\) is the sample mean of the vector.

Note that mean(x) can also be written as:

sum(x) / length(x)
#> [1] 6.142857

Students have a tendency to use this, but remember, mean(x) is much easier to read and write.

range(x) # the range of x values
#> [1]  1 13
summary(x) # a statistical summary of x
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.000   4.000   5.000   6.143   8.000  13.000
cumsum(x)  # cumulative sum
#> [1]  5  6  9 14 27 34 43
cumprod(x) # cumulative product
#> [1]     5     5    15    75   975  6825 61425
cummax(x)  # cumulative maximum
#> [1]  5  5  5  5 13 13 13
cummin(x)  # cumulative minimum
#> [1] 5 1 1 1 1 1 1

As a reminder, to view the documentation of any of the above, use ?name_of_function, for example ?mean.

The above examples take as input a vector of any length, and output a vector of length one. Other functions, especially arithmetic and other mathematical functions, will perform element-by-element operations. That is, there are many functions that take as input a vector of some length, then output a vector of the same length.

Some examples:

# a vector we will perform operations on
y = c(5, 4, 3, 2, 1)
y
#> [1] 5 4 3 2 1
y + 1
#> [1] 6 5 4 3 2
y - 2
#> [1]  3  2  1  0 -1
y * 3
#> [1] 15 12  9  6  3
y / 2
#> [1] 2.5 2.0 1.5 1.0 0.5

What’s actually happening in these examples is a bit tricky. We’ll need to revisit when we talk about length coercion.

sqrt(y)
#> [1] 2.236068 2.000000 1.732051 1.414214 1.000000
log(y)
#> [1] 1.6094379 1.3862944 1.0986123 0.6931472 0.0000000

You should try some additional mathematical functions from the previous chapter as well.

3.4 Summary

In this chapter you’ve learned to mentally model computation in R through object and functions. The other “details” presented in this chapter are things we will likely return to several times throughout this book.

3.5 What’s Next?

In the next chapter chapters we will:

  • Start a deep dive into Base R, the built-in capabilities of the language.
  • Introduce data structures and types.

  1. If you’re familiar with object-oriented programming don’t confuse everything being an object with the sorts of objects you might be used to seeing in such a paradigm. In this case objects are just “things,” that is, something that stores data, and won’t necessarily have methods (code) attached to them. R does have multiple systems for OOP, one of which, S3, we will discuss later.↩︎

  2. Code is data, but again, we’re being general and not technical here.↩︎

  3. This is actually a lie. Technically we can retrieve whichever object was created last with .Last.value.↩︎

  4. As you’re first learning R, this may seem like a trivial detail, but it is actually incredibly important. While the full details are outside the scope of this course, it is highly recommended that at some point in your R career you read the Names and Values chapter of Advanced R.↩︎

  5. We use this perhaps odd language to clarify that the object itself doesn’t have a name but that the name points to the object. An object can have multiple names pointing to it, that is, multiple names that will return that specific object.↩︎

  6. You might also hear this function referred to as the concatenate function, but we find that to be confusing as to some users this will imply that you are performing string concatenation.↩︎

  7. Yes there is a function to create functions. Very meta.↩︎

  8. You will sometimes see R programmers create functions without names. These are called anonymous functions. They will be useful later, but for now you should assign a name to all of your functions.↩︎

  9. Technically, you do not always need to use braces, and could simply write an expression where the braces are. However, any multiple line expression will necessarily require braces. We recommend always placing a function’s body within braces.↩︎

  10. You do not need to specify an environment when creating a function. This happens automatically.↩︎

  11. R mostly uses a pass by value evaluation strategy.↩︎

  12. For additional details, see the Argument Matching section of the R Language Definition↩︎

  13. Objects assigned to names zero, one, two, three, and result also temporarily existed when the function ran. Notice they did not appear in your global environment.↩︎

  14. We should probably also suggest you iterate over the indexes of a vector rather than the elements of the vector, but that doesn’t have as nice a ring to it. Again, more on this when we discuss vertorization and iteration later.↩︎

  15. But for loops can be written very poorly in R, which we will discuss later.↩︎

  16. They are estimates (statistics) of population parameters.↩︎