8  Logical Operations

This chapter was developed from scratch for the Fall 2022 semester. As such, you might notice a few extra typos, or some topics that are not well explained. If you encounter these issues, please let us know on the discussion forum. Except some additional changes to what is currently published while this warning persists.

As we saw with subsetting, logical vectors play an important role in R. We will first introduce comparison operators that are used to create logical vectors. Then we will discuss Boolean operators that, as the name suggests, perform operations of logical vectors. Using these two ideas together, we will return to logical subsetting now that we are able to more effectively create useful logical vectors for subsetting.

After reading this chapter you should be able to:

8.1 Relational Operators

There are six binary comparison (sometimes called relational) operators in R.

  • x < y checks if x is less than y
  • x > y checks if x is greater than y
  • x <= y checks if x is less than or equal to y
  • x >= y checks if x is greater than or equal to y
  • x == y checks if x is equal to y
  • x != y checks if x is equal to y

Each of the six operators expects two atomic vectors as input, one on each side of the operator.1 Each operator returns a logical vector that is the same length as the longest input vector.

For detailed documentation, use:

?base::Comparison

These operators perform the operation that you (probably) expect given numeric (integer or double) input. Consider the following numeric examples:

Operator Example Expression Evaluation
< 1.0 < 2.3 TRUE
> 1.0 > 2.3 FALSE
<= 1.0 <= 2.3 TRUE
>= 1.0 >= 2.3 FALSE
== 1.0 == 2.3 FALSE
!= 1.0 != 2.3 TRUE

With logical input, you can reliably predict the output if you assume that inputs are first coerced to numeric. That is, TRUE becomes 1 and FALSE becomes 0.

Operator Example Expression Evaluation
< TRUE < FALSE FALSE
> TRUE > FALSE TRUE
<= TRUE <= FALSE FALSE
>= TRUE >= FALSE TRUE
== TRUE == FALSE FALSE
!= TRUE != FALSE TRUE

These comparison operators can be used with string input (via character vectors) but come with several complications related to the locale under which R is run. A locale defines preferences based on a user’s language and location that often have an effect on how strings are processed. In this case, the possibility of different locales means that the same comparison performed under different locales can lead to different results.2

The following examples were evaluated using the en_US.UTF-8 locale.

Operator Example Expression Evaluation
< "foo" < "bar" FALSE
> "foo" > "bar" TRUE
<= "foo" <= "bar" FALSE
>= "foo" >= "bar" TRUE
== "foo" == "bar" FALSE
!= "foo" != "bar" TRUE

Character strings are compared using lexicographical order which extends the idea of alphabetical order to consider letters, numbers, and additional symbols. To get a sense of this ordering in the en_US.UTF-8 locale, consider the following:3

string_order = sort(c(
  letters,
  LETTERS,
  0:9,
  c("!", "@", "#", "$", "%", "^", "&", "*", "(", ")"),
  c("`", "~", "-", "_", "=", "+"),
  c("[", "{", "]", "}", "\\", "|"),
  c(";", ":", "'", '"', ",", "<", ".", ">", "/", "?", " ")
))
noquote(string_order)
#>  [1]    _  -  ,  ;  :  !  ?  .  '  "  (  )  [  ]  {  }  @  *  /  \\ &  #  %  ` 
#> [26] ^  +  <  =  >  |  ~  $  0  1  2  3  4  5  6  7  8  9  a  A  b  B  c  C  d 
#> [51] D  e  E  f  F  g  G  h  H  i  I  j  J  k  K  l  L  m  M  n  N  o  O  p  P 
#> [76] q  Q  r  R  s  S  t  T  u  U  v  V  w  W  x  X  y  Y  z  Z

So far, the examples we’ve considered compared two vectors of length one. But at this point, it should come as no surprise that these operators are vectorized. Some examples to illustrate this point:

c(6, 1, 2, 3, 5, 8) > c(1, 6, 2, 3, 5, 6)
#> [1]  TRUE FALSE FALSE FALSE FALSE  TRUE
1:10 <= pi
#>  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1:10 >= c(8, pi)
#>  [1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
c(TRUE, TRUE, FALSE) == 1
#> [1]  TRUE  TRUE FALSE

As we’ve seen before, using vectorization often introduces type coercion and recycling. Consider a specific example from those above:

c(TRUE, TRUE, FALSE) == 1

First, note that on the left-hand side (LHS), we have a logical vector, and on the right-hand side (RHS), a double vector. Thus, type coercion must take place.

c(1, 1, 0) == 1 # after type coercion

There is a vector of length three on the LHS, but a vector of length one on the RHS. So as expected, recycling will also occur.

c(1, 1, 0) == c(1, 1, 1) # after recycling

Then, lastly, comparison is done element-by-element to arrive at the result.

8.1.1 Testing for Equality

Of the operators above is of particular importance. The == operator can be said to check for equality. Note that this first introduces a minor confusion due to using the = symbol.

  • = is used to assign a name to an object.4
  • == is used to check for equality.

But more importantly, what does it mean to check for equality? Equality has many definitions, but they all have some notion of the “sameness” of two or more things. Mathematically, in simple cases we say two numbers are equal if they both represent the same quantity, but this notion can be extended to more complex mathematical objects.

How does R decide when things are equal? Let’s start by looking at two potentially surprising examples.

1 == 1L
#> [1] TRUE

Here we are comparing the double 1 to the integer 1. This is a tricky case. Are they equal? In some sense, yes, because they both represent the number 1. However, they lack some “sameness” because they have a different type. So how does R decide? It doesn’t! This result is simply a side-effect of coercion. 1L, the integer version of 1, is first coerced to be the double 1 before the comparison is made!

0.1 + 0.2 == 0.3
#> [1] FALSE

This example is not a gotcha, but instead a side-effect of how computer deal with numbers. That is, floating-point arithmetic is here to cause problems again.

print(0.3, digits = 20)
#> [1] 0.2999999999999999889
print(0.1 + 0.2, digits = 20)
#> [1] 0.30000000000000004441

We do not recommend doing so in practice, but the above code can be used to see a slightly more accurate representation of what is stored when we use the numbers 0.3 and 0.1 + 0.2.

To side-step this floating-point issue, we often check for equality while allowing for a tolerance, or very small difference, between two numbers. To do so in R, we use the all.equal function.

all.equal(target = 0.3, current = 0.1 + 0.2)
#> [1] TRUE

Here, we obtain the desired result. Use the following to view the relevant documentation:

?base::all.equal

By default, when used to compare numeric vectors, all.equal will allow for a tolerance of sqrt(.Machine$double.eps) between values.

sqrt(.Machine$double.eps)
#> [1] 1.490116e-08

The above shows the value of this quantity for the machine this chapter was rendered on.

1 == 1L
#> [1] TRUE
typeof(1)
#> [1] "double"
typeof(1L)
#> [1] "integer"
identical(1, 1L)
#> [1] FALSE
  • TODO: ==
  • TODO: all.equal
  • TODO: identical
  • TODO: math vs memory
0.1 + 0.2 == 0.3
#> [1] FALSE
all.equal(0.1 + 0.2, 0.3)
#> [1] TRUE

8.1.2 Additional Relational Operators

?`%in%`
  • TODO: is.na, is.null, is.____, anyNA
x1 = 0.5 - 0.3
x2 = 0.3 - 0.1
x1 == x2                   # FALSE on most machines
#> [1] FALSE
isTRUE(all.equal(x1, x2))  # TRUE everywhere
#> [1] TRUE

8.2 Boolean Operators

If comparison operators are useful for creating logical vectors, Boolean operators are useful for operating on logical vectors.

  • TODO: these are operations on logic values (they often work on numeric values, expect coercion)
  • TODO: truth table
  • TODO: “Logic on how to combine results of comparisons, i.e., logical vectors.”
?base::Logic
  • TODO: vector in, vector out, vs vector in, vector of length one out
foo = c(TRUE, TRUE, TRUE)
bar = c(TRUE, FALSE, TRUE)
baz = c(FALSE, FALSE, FALSE)
c(any(foo), any(bar), any(baz))
#> [1]  TRUE  TRUE FALSE
?any
c(all(foo), all(bar), all(baz))
#> [1]  TRUE FALSE FALSE
?all
x y !x x & y x | y xor(x, y)
TRUE TRUE FALSE TRUE TRUE FALSE
TRUE FALSE FALSE FALSE TRUE TRUE
FALSE TRUE TRUE FALSE TRUE TRUE
FALSE FALSE TRUE FALSE FALSE FALSE
  • TODO: which, which.min, which.max
log_vec = c(FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE)
which(log_vec)
#> [1] 3 4 7
single_or = expression(TRUE | FALSE | FALSE | FALSE | FALSE | FALSE)
double_or = expression(TRUE || FALSE || FALSE || FALSE || FALSE || FALSE)
microbenchmark::microbenchmark(
  eval(single_or), 
  eval(double_or),
  unit = "ns"
)
#> Unit: nanoseconds
#>             expr  min     lq    mean median   uq   max neval
#>  eval(single_or) 1517 1574.5 1750.39   1680 1813  3646   100
#>  eval(double_or) 1129 1178.5 1466.25   1211 1375 16609   100
  • TODO: isTRUE, isFALSE

8.3 Logical Subsetting

Now that we’ve introduced comparison and Boolean operators, we can quickly create useful logical vectors that will better highlight the power of logical subsetting.

  • TODO: three examples: proportion, replacement, subset

  • TODO: proportion between example

set.seed(42)
x = runif(n = 1000)
y = runif(n = 1000)
z = runif(n = 1000)
c(mean(x > 0.7), mean(y > 0.7), mean(z > 0.7))
#> [1] 0.293 0.311 0.308
  • TODO: example with data frames rows
airquality[airquality$Wind > 15, ]
#>     Ozone Solar.R Wind Temp Month Day
#> 9       8      19 20.1   61     5   9
#> 18      6      78 18.4   57     5  18
#> 22     11     320 16.6   73     5  22
#> 25     NA      66 16.6   57     5  25
#> 34     NA     242 16.1   67     6   3
#> 48     37     284 20.7   72     6  17
#> 113    21     259 15.5   77     8  21
#> 129    32      92 15.5   84     9   6
#> 135    21     259 15.5   76     9  12
#> 148    14      20 16.6   63     9  25
airquality[sapply(airquality, anyNA)]
#>     Ozone Solar.R
#> 1      41     190
#> 2      36     118
#> 3      12     149
#> 4      18     313
#> 5      NA      NA
#> 6      28      NA
#> 7      23     299
#> 8      19      99
#> 9       8      19
#> 10     NA     194
#> 11      7      NA
#> 12     16     256
#> 13     11     290
#> 14     14     274
#> 15     18      65
#> 16     14     334
#> 17     34     307
#> 18      6      78
#> 19     30     322
#> 20     11      44
#> 21      1       8
#> 22     11     320
#> 23      4      25
#> 24     32      92
#> 25     NA      66
#> 26     NA     266
#> 27     NA      NA
#> 28     23      13
#> 29     45     252
#> 30    115     223
#> 31     37     279
#> 32     NA     286
#> 33     NA     287
#> 34     NA     242
#> 35     NA     186
#> 36     NA     220
#> 37     NA     264
#> 38     29     127
#> 39     NA     273
#> 40     71     291
#> 41     39     323
#> 42     NA     259
#> 43     NA     250
#> 44     23     148
#> 45     NA     332
#> 46     NA     322
#> 47     21     191
#> 48     37     284
#> 49     20      37
#> 50     12     120
#> 51     13     137
#> 52     NA     150
#> 53     NA      59
#> 54     NA      91
#> 55     NA     250
#> 56     NA     135
#> 57     NA     127
#> 58     NA      47
#> 59     NA      98
#> 60     NA      31
#> 61     NA     138
#> 62    135     269
#> 63     49     248
#> 64     32     236
#> 65     NA     101
#> 66     64     175
#> 67     40     314
#> 68     77     276
#> 69     97     267
#> 70     97     272
#> 71     85     175
#> 72     NA     139
#> 73     10     264
#> 74     27     175
#> 75     NA     291
#> 76      7      48
#> 77     48     260
#> 78     35     274
#> 79     61     285
#> 80     79     187
#> 81     63     220
#> 82     16       7
#> 83     NA     258
#> 84     NA     295
#> 85     80     294
#> 86    108     223
#> 87     20      81
#> 88     52      82
#> 89     82     213
#> 90     50     275
#> 91     64     253
#> 92     59     254
#> 93     39      83
#> 94      9      24
#> 95     16      77
#> 96     78      NA
#> 97     35      NA
#> 98     66      NA
#> 99    122     255
#> 100    89     229
#> 101   110     207
#> 102    NA     222
#> 103    NA     137
#> 104    44     192
#> 105    28     273
#> 106    65     157
#> 107    NA      64
#> 108    22      71
#> 109    59      51
#> 110    23     115
#> 111    31     244
#> 112    44     190
#> 113    21     259
#> 114     9      36
#> 115    NA     255
#> 116    45     212
#> 117   168     238
#> 118    73     215
#> 119    NA     153
#> 120    76     203
#> 121   118     225
#> 122    84     237
#> 123    85     188
#> 124    96     167
#> 125    78     197
#> 126    73     183
#> 127    91     189
#> 128    47      95
#> 129    32      92
#> 130    20     252
#> 131    23     220
#> 132    21     230
#> 133    24     259
#> 134    44     236
#> 135    21     259
#> 136    28     238
#> 137     9      24
#> 138    13     112
#> 139    46     237
#> 140    18     224
#> 141    13      27
#> 142    24     238
#> 143    16     201
#> 144    13     238
#> 145    23      14
#> 146    36     139
#> 147     7      49
#> 148    14      20
#> 149    30     193
#> 150    NA     145
#> 151    14     191
#> 152    18     131
#> 153    20     223

8.4 Summary

  • TODO: You’ve learned to…

8.5 What’s Next?

  • TODO: control flow, etc.

8.6 TODO

  • TODO: ?intersect (set operations?)
  • TODO: see also: Hands-On Programming with R: Boolean Operators
  • TODO: with() and within()?
    • TODO: subset() and related functions? (’transform`)
    • TODO: warning about NSE?
    • TODO: probably for a later chapter?
asdf = data.frame(
  x = 1:2,
  y = 3:4
)
within(asdf, {y = 5:6})
#>   x y
#> 1 1 5
#> 2 2 6
with(asdf, {y * 5:6})
#> [1] 15 24
subset(
  x = airquality,
  subset = Wind >= 8 & Wind <= 8.5,
  select = c("Ozone", "Wind")
)
#>     Ozone Wind
#> 2      36    8
#> 27     NA    8
#> 44     23    8
#> 56     NA    8
#> 57     NA    8
#> 61     NA    8
#> 86    108    8
#> 101   110    8
#> 118    73    8
#> 143    16    8
#> 152    18    8
transform(head(airquality), new = -Ozone, Temp = (Temp - 32) / 1.8)
#>   Ozone Solar.R Wind     Temp Month Day new
#> 1    41     190  7.4 19.44444     5   1 -41
#> 2    36     118  8.0 22.22222     5   2 -36
#> 3    12     149 12.6 23.33333     5   3 -12
#> 4    18     313 11.5 16.66667     5   4 -18
#> 5    NA      NA 14.3 13.33333     5   5  NA
#> 6    28      NA 14.9 18.88889     5   6 -28

  1. Technically, other objects can work with these operators, but for the most part, assume only atomic vectors.↩︎

  2. As an example, “Z” is the last letter of the alphabet in Engligh, but in Estonian, “Z” comes between “S” and “T”.↩︎

  3. Here we use noquote rather than print as doing so suppresses the " characters around each string.↩︎

  4. In some sense, this has absolutely nothing to do with equality and is perhaps an argument for using <- rather than = as the assignment operator.↩︎