Atomic R

Programming for Data



Welcome to Atomic R!

#> [1] TRUE

This book is designed for use in STAT 385, Statistical Programming Methods, at the University of Illinois Urbana-Champaign. This book is a work in progress. While it is a work in progress, use this book outside of STAT 385 at your own risk. You may see “TODO” scattered throughout, with some notes to the author. These will help you know what information is still to be added and updated.

Why R?

Truthfully, because the author was trained by statisticians and R is a language written by statisticians for statisticians.

Many who are new to R might wonder: Why not Python? No reason really, except that this is a book about R. Both R and Python are useful languages for interacting with data. If you’d like to learn Python, go for it! Realistically, it would be incredibly useful to learn both R and Python. Wouldn’t hurt to give Julia a try either.

Some of the best features of R include:

  • The data.frame data structure, used to store tabular data for visualization and modeling.
  • Strong built-in support for probability, statistics, and modeling. No external packages needed!
  • The fantastic ggplot2 package, which implements Leland Wilkinson’s Grammar of Graphics.
  • The (expressive) dplyr and (super-fast) data.table packages for manipulating tabular data.
  • Noticing a pattern? R has a huge community that contributes excellent packages, many of which are hosted via CRAN, the Comprehensive R Archive Network. Installing well tested binaries of packages is often as easy as install.packages("data.table")!1

Why Base R?

If you have some prior R experience, you might also wonder: Why not start with the tidyverse? If this questions doesn’t mean anything to you, skip the rest of this subsection.

First, we would like to point out that we are not saying you should not learn the tidyverse, just that we believe that learning Base R first has some advantages. Our recommendation is to start with Base R, then explore both the tidyverse and data.table, which is sometimes said to be part of the tinyverse.

We won’t bore you with all of the details of our reasoning for this approach.2 Broadly, Base R has been around since 1993. While we’re on version 4.2.3 now, Base R has been stable for quite some time. The tidyverse is much newer. While it is relatively stable, and we believe it will be around for a long time, it does go through somewhat frequent changes. Two principles lead us to a cautious approach to all things new:


Source code and its output will be written in a monospaced4 font.

Source code will be displayed with a subtle grey background. For example:

# this is some source code
x = 1:10
y = x + c(1, -1)

Code output will be displayed with a subtle blue background. Additionally, output lines will be prefixed with #>. For example:

#> [1] "This is some output."

Often, output will immediately follow its source.

x = 1:10
y = x + c(1, -1)
x + y
#>  [1]  3  3  7  7 11 11 15 15 19 19

There will usually be some additional hints from R that the output is output, like the [1] seen above. These context clues will become move obvious over time as you progress through the book and learn about how R prints objects.

Sometimes, objects will be printed as a tree to better understand their structure.

#> S3<data.frame>
#> ├─Ozone<int [6]>: 41, 36, 12, 18, NA, 28
#> ├─Solar.R<int [6]>: 190, 118, 149, 313, NA, NA
#> ├─Wind<dbl [6]>: 7.4, 8, 12.6, 11.5, 14.3, 14.9
#> ├─Temp<int [6]>: 67, 72, 74, 62, 56, 66
#> ├─Month<int [6]>: 5, 5, 5, 5, 5, 5
#> └─Day<int [6]>: 1, 2, 3, 4, 5, 6

Standing on the Shoulders of Giants

This section could also be titled: Why Do We Need Another Book About R? Well, we really don’t. But we find it useful to write “books” that specifically match the content of courses. However, with that in mind, the following is an incomplete list of other R resources, mostly books, you might find useful.5

We would like to give special mention to two books in particular. Had we known these existed when we began writing this text, we may have never bothered, as they are very similar in style and approach. As such, they come with the highest possible recommendation.

  1. Need binaries on Linux? Allow me to introduce you to r2u.↩︎

  2. Although, I suppose it would be good to expand on this at some point.↩︎

  3. Thanks to Dave Zhao for bringing this idea to our attention.↩︎

  4. A monospaced font is a font such that each possible character occupies the same amount of horizontal space on a screen. This is standard practice when coding. This is opposed to a proportional font you might see when typing prose that allows different characters to occupy differing amounts of horizontal space. Monospace fonts may also be referred to as fixed-width fonts.↩︎

  5. Additional resources will someday be collected in an appendix.↩︎