Atomic R
Programming for Data
Welcome
Welcome to Atomic R!
is.atomic("R")
#> [1] TRUE
This book is designed for use in STAT 385, Statistical Programming Methods, at the University of Illinois Urbana-Champaign. This book is a work in progress. While it is a work in progress, use this book outside of STAT 385 at your own risk. You may see “TODO” scattered throughout, with some notes to the author. These will help you know what information is still to be added and updated.
Why R?
Truthfully, because the author was trained by statisticians and R is a language written by statisticians for statisticians.
Many who are new to R might wonder: Why not Python? No reason really, except that this is a book about R. Both R and Python are useful languages for interacting with data. If you’d like to learn Python, go for it! Realistically, it would be incredibly useful to learn both R and Python. Wouldn’t hurt to give Julia a try either.
Some of the best features of R include:
- The
data.frame
data structure, used to store tabular data for visualization and modeling. - Strong built-in support for probability, statistics, and modeling. No external packages needed!
- The fantastic
ggplot2
package, which implements Leland Wilkinson’s Grammar of Graphics. - The (expressive)
dplyr
and (super-fast)data.table
packages for manipulating tabular data. - Noticing a pattern? R has a huge community that contributes excellent packages, many of which are hosted via CRAN, the Comprehensive R Archive Network. Installing well tested binaries of packages is often as easy as
install.packages("data.table")
!1
Why Base R?
If you have some prior R experience, you might also wonder: Why not start with the tidyverse
? If this questions doesn’t mean anything to you, skip the rest of this subsection.
First, we would like to point out that we are not saying you should not learn the tidyverse
, just that we believe that learning Base R first has some advantages. Our recommendation is to start with Base R, then explore both the tidyverse
and data.table
, which is sometimes said to be part of the tinyverse
.
We won’t bore you with all of the details of our reasoning for this approach.2 Broadly, Base R has been around since 1993. While we’re on version 4.2.3 now, Base R has been stable for quite some time. The tidyverse
is much newer. While it is relatively stable, and we believe it will be around for a long time, it does go through somewhat frequent changes. Two principles lead us to a cautious approach to all things new:
- 枯れた技術の水平思考, which translates to Lateral Thinking with Withered Technology, a core principle of the Nintendo Corporation.3
- The Lindy effect, which was popularized by Nassim Taleb.
Conventions
Source code and its output will be written in a monospaced
4 font.
Source code will be displayed with a subtle grey background. For example:
# this is some source code
= 1:10
x = x + c(1, -1) y
Code output will be displayed with a subtle blue background. Additionally, output lines will be prefixed with #>
. For example:
#> [1] "This is some output."
Often, output will immediately follow its source.
= 1:10
x = x + c(1, -1)
y + y x
#> [1] 3 3 7 7 11 11 15 15 19 19
There will usually be some additional hints from R that the output is output, like the [1]
seen above. These context clues will become move obvious over time as you progress through the book and learn about how R prints objects.
Sometimes, objects will be printed as a tree to better understand their structure.
::tree(head(airquality)) lobstr
#> S3<data.frame>
#> ├─Ozone<int [6]>: 41, 36, 12, 18, NA, 28
#> ├─Solar.R<int [6]>: 190, 118, 149, 313, NA, NA
#> ├─Wind<dbl [6]>: 7.4, 8, 12.6, 11.5, 14.3, 14.9
#> ├─Temp<int [6]>: 67, 72, 74, 62, 56, 66
#> ├─Month<int [6]>: 5, 5, 5, 5, 5, 5
#> └─Day<int [6]>: 1, 2, 3, 4, 5, 6
Standing on the Shoulders of Giants
This section could also be titled: Why Do We Need Another Book About R? Well, we really don’t. But we find it useful to write “books” that specifically match the content of courses. However, with that in mind, the following is an incomplete list of other R resources, mostly books, you might find useful.5
- Hands-On Programming with R
- Garrett Grolemund
- R for Data Science
- Hadley Wickham & Garrett Grolemund
- Advanced R
- Hadley Wickham
- R Packages
- Hadley Wickham
- R Markdown: The Definitive Guide
- Yihui Xie, J. J. Allaire, Garrett Grolemund
- Mastering Shiny
- Hadley Wickham
- Happy Git and GitHub for the useR
- Jennifer Bryan, The STAT 545 TAs, Jim Hester
- STAT 545: Data Wrangling, Exploration, and Analysis with R
- Jennifer Bryan, The STAT 545 TAs
- R Graphics Cookbook
- Winston Chang
- Efficient R Programming
- Colin Gillespie, Robin Lovelace
- What They Forgot to Teach You About R
- Jennifer Bryan, Jim Hester
- The R Inferno
- Patrick Burns
- The R Manuals
- A collection of “manuals” written by the R Development Core team.
- Posit Cheatsheets
- A collection of “cheatsheets” for using R, RStudio, and various R packages.
We would like to give special mention to two books in particular. Had we known these existed when we began writing this text, we may have never bothered, as they are very similar in style and approach. As such, they come with the highest possible recommendation.
- R Coding Basics: An Introduction to the Basics of Coding in R
- Gaston Sanchez
- Deep R Programming
- Marek Gagolewski
Although, I suppose it would be good to expand on this at some point.↩︎
Thanks to Dave Zhao for bringing this idea to our attention.↩︎
A monospaced font is a font such that each possible character occupies the same amount of horizontal space on a screen. This is standard practice when coding. This is opposed to a proportional font you might see when typing prose that allows different characters to occupy differing amounts of horizontal space. Monospace fonts may also be referred to as fixed-width fonts.↩︎
Additional resources will someday be collected in an appendix.↩︎