8 Logical Operations
As we saw with subsetting, logical vectors play an important role in R. We will first introduce comparison operators that are used to create logical vectors. Then we will discuss Boolean operators that, as the name suggests, perform operations on logical vectors. Using these two ideas together, we will return to logical subsetting now that we are able to more effectively create useful logical vectors for subsetting.
After reading this chapter you should be able to:
- Use comparison operators and related functions to create logical vectors.
- Use Boolean operators to perform operations on logical vectors.
- Combine comparison and Boolean operators to perform logical subsetting.
8.1 Relational Operators
There are six binary comparison (sometimes called relational) operators in R.
x < y
checks ifx
is less thany
x > y
checks ifx
is greater thany
x <= y
checks ifx
is less than or equal toy
x >= y
checks ifx
is greater than or equal toy
x == y
checks ifx
is equal toy
x != y
checks ifx
is not equal toy
Each of the six operators expects two atomic vectors as input, one on each side of the operator.1 Each operator returns a logical vector that is the same length as the longest input vector.
For detailed documentation, use:
::Comparison ?base
These operators perform the operation that you (probably) expect given numeric (integer or double) input. Consider the following numeric examples:
Operator | Example Expression | Evaluation |
---|---|---|
< |
1.0 < 2.3 |
TRUE |
> |
1.0 > 2.3 |
FALSE |
<= |
1.0 <= 2.3 |
TRUE |
>= |
1.0 >= 2.3 |
FALSE |
== |
1.0 == 2.3 |
FALSE |
!= |
1.0 != 2.3 |
TRUE |
With logical input, you can reliably predict the output if you assume that inputs are first coerced to numeric. That is, TRUE
becomes 1
and FALSE
becomes 0
.
Operator | Example Expression | Evaluation |
---|---|---|
< |
TRUE < FALSE |
FALSE |
> |
TRUE > FALSE |
TRUE |
<= |
TRUE <= FALSE |
FALSE |
>= |
TRUE >= FALSE |
TRUE |
== |
TRUE == FALSE |
FALSE |
!= |
TRUE != FALSE |
TRUE |
These comparison operators can be used with string input (via character vectors) but come with several complications related to the locale under which R is run. A locale defines preferences based on a user’s language and location that often have an effect on how strings are processed. In this case, the possibility of different locales means that the same comparison performed under different locales can lead to different results.2
The following examples were evaluated using the en_US.UTF-8
locale.
Operator | Example Expression | Evaluation |
---|---|---|
< |
"foo" < "bar" |
FALSE |
> |
"foo" > "bar" |
TRUE |
<= |
"foo" <= "bar" |
FALSE |
>= |
"foo" >= "bar" |
TRUE |
== |
"foo" == "bar" |
FALSE |
!= |
"foo" != "bar" |
TRUE |
Character strings are compared using lexicographical order which extends the idea of alphabetical order to consider letters, numbers, and additional symbols. To get a sense of this ordering in the en_US.UTF-8
locale, consider the following:3
= sort(c(
string_order
letters,
LETTERS,0:9,
c("!", "@", "#", "$", "%", "^", "&", "*", "(", ")"),
c("`", "~", "-", "_", "=", "+"),
c("[", "{", "]", "}", "\\", "|"),
c(";", ":", "'", '"', ",", "<", ".", ">", "/", "?", " ")
))noquote(string_order)
#> [1] _ - , ; : ! ? . ' " ( ) [ ] { } @ * / \\ & # % `
#> [26] ^ + < = > | ~ $ 0 1 2 3 4 5 6 7 8 9 a A b B c C d
#> [51] D e E f F g G h H i I j J k K l L m M n N o O p P
#> [76] q Q r R s S t T u U v V w W x X y Y z Z
So far, the examples we’ve considered compared two vectors of length one. But at this point, it should come as no surprise that these operators are vectorized. Some examples to illustrate this point:
c(6, 1, 2, 3, 5, 8) > c(1, 6, 2, 3, 5, 6)
#> [1] TRUE FALSE FALSE FALSE FALSE TRUE
1:10 <= pi
#> [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1:10 >= c(8, pi)
#> [1] FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
c(TRUE, TRUE, FALSE) == 1
#> [1] TRUE TRUE FALSE
As we’ve seen before, using vectorization often introduces type coercion and recycling. Consider a specific example from those above:
c(TRUE, TRUE, FALSE) == 1
First, note that on the left-hand side (LHS), we have a logical vector, and on the right-hand side (RHS), a double vector. Thus, type coercion must take place.
c(1, 1, 0) == 1 # after type coercion
There is a vector of length three on the LHS, but a vector of length one on the RHS. So as expected, recycling will also occur.
c(1, 1, 0) == c(1, 1, 1) # after recycling
Then, lastly, comparison is done element-by-element to arrive at the result.
8.1.1 Testing for Equality
One the operators above is of particular importance. The ==
operator can be said to check for equality. Note that this first introduces a minor confusion due to using the =
symbol.
=
is used to assign a name to an object.4==
is used to check for equality.
But more importantly, what does it mean to check for equality? Equality has many definitions, but they all have some notion of the “sameness” of two or more things. Mathematically, in simple cases we say two numbers are equal if they both represent the same quantity, but this notion can be extended to more complex mathematical objects.
How does R decide when things are equal? Let’s start by looking at two potentially surprising examples.
1 == 1L
#> [1] TRUE
Here we are comparing the double 1
to the integer 1
. This is a tricky case. Are they equal? In some sense, yes, because they both represent the number 1. However, they lack some “sameness” because they have a different type. So how does R decide? It doesn’t! This result is simply a side-effect of coercion. 1L
, the integer version of 1
, is first coerced to be the double 1
before the comparison is made!
0.1 + 0.2 == 0.3
#> [1] FALSE
This example is not a gotcha, but instead a side-effect of how computers deal with numbers. That is, floating-point arithmetic is here to cause problems again.
print(0.3, digits = 20)
#> [1] 0.2999999999999999889
print(0.1 + 0.2, digits = 20)
#> [1] 0.30000000000000004441
We do not recommend doing so in practice, but the above code can be used to see a slightly more accurate representation of what is stored when we use the numbers 0.3
and 0.1 + 0.2
.
To side-step this floating-point issue, we often check for equality while allowing for a tolerance, or very small difference, between two numbers. To do so in R, we use the all.equal
function.
all.equal(target = 0.3, current = 0.1 + 0.2)
#> [1] TRUE
Here, we obtain the desired result. Use the following to view the relevant documentation:
::all.equal ?base
By default, when used to compare numeric vectors, all.equal
will allow for a tolerance of sqrt(.Machine$double.eps)
between values.
sqrt(.Machine$double.eps)
#> [1] 1.490116e-08
The above shows the value of this quantity for the machine this chapter was rendered on.
1 == 1L
#> [1] TRUE
typeof(1)
#> [1] "double"
typeof(1L)
#> [1] "integer"
identical(1, 1L)
#> [1] FALSE
- TODO:
==
- TODO:
all.equal
- TODO:
identical
- TODO: math vs memory
0.1 + 0.2 == 0.3
#> [1] FALSE
all.equal(0.1 + 0.2, 0.3)
#> [1] TRUE
8.1.2 Additional Relational Operators
`%in%` ?
- TODO:
is.na
,is.null
,is.____
,anyNA
= 0.5 - 0.3
x1 = 0.3 - 0.1
x2 == x2 # FALSE on most machines x1
#> [1] FALSE
isTRUE(all.equal(x1, x2)) # TRUE everywhere
#> [1] TRUE
8.2 Boolean Operators
If comparison operators are useful for creating logical vectors, Boolean operators are useful for operating on logical vectors.
- TODO: these are operations on logic values (they often work on numeric values, expect coercion)
- TODO: truth table
- TODO: “Logic on how to combine results of comparisons, i.e., logical vectors.”
::Logic ?base
- TODO: vector in, vector out, vs vector in, vector of length one out
= c(TRUE, TRUE, TRUE)
foo = c(TRUE, FALSE, TRUE)
bar = c(FALSE, FALSE, FALSE) baz
c(any(foo), any(bar), any(baz))
#> [1] TRUE TRUE FALSE
?any
c(all(foo), all(bar), all(baz))
#> [1] TRUE FALSE FALSE
?all
x | y | !x | x & y | x | y | xor(x, y) |
---|---|---|---|---|---|
TRUE | TRUE | FALSE | TRUE | TRUE | FALSE |
TRUE | FALSE | FALSE | FALSE | TRUE | TRUE |
FALSE | TRUE | TRUE | FALSE | TRUE | TRUE |
FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
- TODO:
which
,which.min
,which.max
= c(FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE)
lgl_vec which(lgl_vec)
#> [1] 3 4 7
= expression(TRUE | FALSE | FALSE | FALSE | FALSE | FALSE)
single_or = expression(TRUE || FALSE || FALSE || FALSE || FALSE || FALSE)
double_or ::microbenchmark(
microbenchmarkeval(single_or),
eval(double_or),
unit = "ns"
)
#> Unit: nanoseconds
#> expr min lq mean median uq max neval
#> eval(single_or) 2000 2100 2275 2100 2200 11300 100
#> eval(double_or) 1400 1500 1581 1500 1600 2500 100
- TODO:
isTRUE
,isFALSE
isTRUE(TRUE)
#> [1] TRUE
isTRUE(FALSE)
#> [1] FALSE
8.3 Logical Subsetting
Now that we’ve introduced comparison and Boolean operators, we can quickly create useful logical vectors that will better highlight the power of logical subsetting.
- TODO: three examples: proportion, replacement, subset
- TODO: proportion between example
set.seed(42)
= runif(n = 1000)
x = runif(n = 1000)
y = runif(n = 1000)
z c(mean(x > 0.7), mean(y > 0.7), mean(z > 0.7))
#> [1] 0.293 0.311 0.308
- TODO: example with data frames rows
$Wind > 15, ] airquality[airquality
#> Ozone Solar.R Wind Temp Month Day
#> 9 8 19 20.1 61 5 9
#> 18 6 78 18.4 57 5 18
#> 22 11 320 16.6 73 5 22
#> 25 NA 66 16.6 57 5 25
#> 34 NA 242 16.1 67 6 3
#> 48 37 284 20.7 72 6 17
#> 113 21 259 15.5 77 8 21
#> 129 32 92 15.5 84 9 6
#> 135 21 259 15.5 76 9 12
#> 148 14 20 16.6 63 9 25
sapply(airquality, anyNA)] airquality[
#> Ozone Solar.R
#> 1 41 190
#> 2 36 118
#> 3 12 149
#> 4 18 313
#> 5 NA NA
#> 6 28 NA
#> 7 23 299
#> 8 19 99
#> 9 8 19
#> 10 NA 194
#> 11 7 NA
#> 12 16 256
#> 13 11 290
#> 14 14 274
#> 15 18 65
#> 16 14 334
#> 17 34 307
#> 18 6 78
#> 19 30 322
#> 20 11 44
#> 21 1 8
#> 22 11 320
#> 23 4 25
#> 24 32 92
#> 25 NA 66
#> 26 NA 266
#> 27 NA NA
#> 28 23 13
#> 29 45 252
#> 30 115 223
#> 31 37 279
#> 32 NA 286
#> 33 NA 287
#> 34 NA 242
#> 35 NA 186
#> 36 NA 220
#> 37 NA 264
#> 38 29 127
#> 39 NA 273
#> 40 71 291
#> 41 39 323
#> 42 NA 259
#> 43 NA 250
#> 44 23 148
#> 45 NA 332
#> 46 NA 322
#> 47 21 191
#> 48 37 284
#> 49 20 37
#> 50 12 120
#> 51 13 137
#> 52 NA 150
#> 53 NA 59
#> 54 NA 91
#> 55 NA 250
#> 56 NA 135
#> 57 NA 127
#> 58 NA 47
#> 59 NA 98
#> 60 NA 31
#> 61 NA 138
#> 62 135 269
#> 63 49 248
#> 64 32 236
#> 65 NA 101
#> 66 64 175
#> 67 40 314
#> 68 77 276
#> 69 97 267
#> 70 97 272
#> 71 85 175
#> 72 NA 139
#> 73 10 264
#> 74 27 175
#> 75 NA 291
#> 76 7 48
#> 77 48 260
#> 78 35 274
#> 79 61 285
#> 80 79 187
#> 81 63 220
#> 82 16 7
#> 83 NA 258
#> 84 NA 295
#> 85 80 294
#> 86 108 223
#> 87 20 81
#> 88 52 82
#> 89 82 213
#> 90 50 275
#> 91 64 253
#> 92 59 254
#> 93 39 83
#> 94 9 24
#> 95 16 77
#> 96 78 NA
#> 97 35 NA
#> 98 66 NA
#> 99 122 255
#> 100 89 229
#> 101 110 207
#> 102 NA 222
#> 103 NA 137
#> 104 44 192
#> 105 28 273
#> 106 65 157
#> 107 NA 64
#> 108 22 71
#> 109 59 51
#> 110 23 115
#> 111 31 244
#> 112 44 190
#> 113 21 259
#> 114 9 36
#> 115 NA 255
#> 116 45 212
#> 117 168 238
#> 118 73 215
#> 119 NA 153
#> 120 76 203
#> 121 118 225
#> 122 84 237
#> 123 85 188
#> 124 96 167
#> 125 78 197
#> 126 73 183
#> 127 91 189
#> 128 47 95
#> 129 32 92
#> 130 20 252
#> 131 23 220
#> 132 21 230
#> 133 24 259
#> 134 44 236
#> 135 21 259
#> 136 28 238
#> 137 9 24
#> 138 13 112
#> 139 46 237
#> 140 18 224
#> 141 13 27
#> 142 24 238
#> 143 16 201
#> 144 13 238
#> 145 23 14
#> 146 36 139
#> 147 7 49
#> 148 14 20
#> 149 30 193
#> 150 NA 145
#> 151 14 191
#> 152 18 131
#> 153 20 223
8.4 Summary
- TODO: You’ve learned to…
8.5 What’s Next?
- TODO: control flow, etc.
8.6 TODO
- TODO:
?intersect
(set operations?) - TODO: see also: Hands-On Programming with R: Boolean Operators
- TODO:
with()
andwithin()
?- TODO:
subset()
and related functions? (’transform`) - TODO: warning about NSE?
- TODO: probably for a later chapter?
- TODO:
= data.frame(
asdf x = 1:2,
y = 3:4
)within(asdf, {y = 5:6})
#> x y
#> 1 1 5
#> 2 2 6
with(asdf, {y * 5:6})
#> [1] 15 24
subset(
x = airquality,
subset = Wind >= 8 & Wind <= 8.5,
select = c("Ozone", "Wind")
)
#> Ozone Wind
#> 2 36 8
#> 27 NA 8
#> 44 23 8
#> 56 NA 8
#> 57 NA 8
#> 61 NA 8
#> 86 108 8
#> 101 110 8
#> 118 73 8
#> 143 16 8
#> 152 18 8
transform(head(airquality), new = -Ozone, Temp = (Temp - 32) / 1.8)
#> Ozone Solar.R Wind Temp Month Day new
#> 1 41 190 7.4 19.44444 5 1 -41
#> 2 36 118 8.0 22.22222 5 2 -36
#> 3 12 149 12.6 23.33333 5 3 -12
#> 4 18 313 11.5 16.66667 5 4 -18
#> 5 NA NA 14.3 13.33333 5 5 NA
#> 6 28 NA 14.9 18.88889 5 6 -28
Technically, other objects can work with these operators, but for the most part, assume only atomic vectors.↩︎
As an example, “Z” is the last letter of the alphabet in Engligh, but in Estonian, “Z” comes between “S” and “T”.↩︎
Here we use
noquote
rather thanprint
as doing so suppresses the"
characters around each string.↩︎In some sense, this has absolutely nothing to do with equality and is perhaps an argument for using
<-
rather than=
as the assignment operator.↩︎