Chapter 1 What you need to be able to do in R before you start

These days most people have used R a bit, but it is possible you’re a bit rusty, or have no R experience. This isn’t a problem, I will try and summarise what you need to be able to do to get these exercises running below. I am not, however, going to write a full introductory guide to R here, so if you’ve never used R before or find it a bit of a struggle, I strongly recommend the book Getting Started With R or there are lots of great tutorials online.

Throughout, R code will be in shaded boxes:

library(ape)

R output will be preceded by ## and important comments will be in quote blocks:

Note that many things in R can be done in multiple ways. You should choose the methods you feel most comfortable with. Don’t panic if I do it differently!

1.1 Installing R (and RStudio)

1.2 Downloading the data

This is zipped and the data/trees for each exercise are in the appropriate folder for each practical exercise. To work through the exercises you need to download all the files into a folder somewhere on your computer. Don’t forget to unzip this before starting. Note that there is a folder for each of the practical exercises in the book.

[WHEN FINISHED CLARIFY THE LOCATION OF THE DATA]

Some datasets are built into R and/or various R packages (see below), so you may see me access these just using code like this (cars is a dataset built into R):

data(cars)

Note that this won’t produce any output unless we then ask it to do something with the data, for example print the first six rows…

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

1.3 R Projects

One feature of R that tends to trip up beginners is that to read data into R you need to tell R where to look for the data. In the past we might have solved this problem by setting the working directory, or writing the full path to a file into our code. This works fine, but if you later move the folder or the data, or send it to a collaborator/supervisor the code stops working. Luckily RStudio has a great solution to the problem: R Projects. If you’re working in an R Project, R will automatically look for data in the folder you’ve chosen to be the project folder!

What’s a Project? A “Project” is just a folder that contains a sensible unit of work, for example your undergraduate or Masters thesis, a chapter of your PhD thesis, a paper, or a coursework project. Projects will likely contain several files, including data and R scripts. You may also want to store information related to the report/thesis chapter/coursework that you’re writing.

To make an R Project for practical exercises in 05-PGLSinR, for example, go to the menu in RStudio and click File > New Project. You should now see a dialogue box with three options: New Directory, Existing Directory, or Version Control (Directory just means a folder on your computer). Because we already have a folder choose the option Existing Directory. The dialogue box moves on so we can use the Browse button to locate the folder 05-PGLSinR. Finally, click the Create Project button.

This will cause a few things to happen. First, RStudio may ask if you want to save any unsaved files if you were already doing something in RStudio. Save them if they are important! Second, RStudio will look like it restarted. And third, a new file will appear in the folder: the .Rproj file, with the same file name as your R Project. When you want to use that R Project in future just double click the .Rproj icon in the folder and it will open the R Project in RStudio for you.

To save you some time we have made an .Rproj file for each chapter of this book. They should have been downloaded with the data. To use these just navigate to the appropriate chapter folder and double click the .Rproj icon to start.

1.4 Using a script

We can use R straight from the Console (the bottom left hand window in a standard RStudio set up), but this is not ideal for various reasons. Firstly, if we want to run a series of separate commands, and perhaps run them again in the future, or alter one of the commands thus needing to re-run all the subsequent ones, then typing commands into the Console is a pain. Equally we might want to save our commands for future use. To do this we use a script.

To get a new script in RStudio use the menus to go from File > New File > R Script. You should type (or copy and paste) your code into the script, edit it until you think it’ll work, and then either paste it into R’s console window, or you can highlight the bit of code you want to run and press ctrl or cmd and enter or R (different computers seem to do this differently). This will automatically send it to the console.

Saving the script file lets you keep a record of the code you used, which can be a great time saver if you want to use it again, especially as you know this code will work!

You can cut and paste code from these materials into your script. You don’t need to retype everything!

If you want to add comments to the file (i.e., notes to remind yourself what the code is doing), put a hash/pound sign (#) in front of the comment. Comments are really important to you remember what you did and why. Always write comments, and generally write more than you think you will need. Future you will thank past you when you need to rerun these analyses in six months!

# Comments are ignored by R but remind you what the code is doing. 
# You need a # at the start of each line of a comment.
# Always make plenty of notes to help you remember what you did and why

1.5 Installing and loading extra packages in R

You’ll notice at the start of each exercise I remind you of the packages you need to install and load. To run any specialised analysis in R, you need to download one or more additional packages on top of the basic R installation. You need to be connected to the internet to do this.

To install the package ape:

install.packages("ape")

Pick the closest mirror to you if asked. Do not put this code into your script as you only need to do this once (until you need to update R), and it’s a waste of your time to do it each time you run your script.

You can also install more than one package at a time:

install.packages(c("ape", "picante"))

Alternatively, you can use the Packages tab in RStudio (in the bottom right hand window in a standard RStudio set up). If you click the Install button this will open a pop up where you can type in a list of the packages you want to install, and then just click Install. If you do this, you’ll see the appropriate install.packages code appear in your console.

If everything worked you will see some (often a lot if you’re installing a big package like tidyverse or ggtree) text appearing in the console, and the occasional warning message but importantly no error messages. Don’t let this worry you. If you’re in doubt about whether a package has installed correctly, go to the next step (see Loading packages into R below) to check.

If you see an error message (these will contain the prefaceError:, see section below on Errors and warnings for more details), then something, somewhere has gone wrong. Installation errors will generally say something about non-zero exit status of the package, which just tells you that it was not installed. You might need to scroll up a bit to find this.

Some common issues that might arise when installing packages are:

  1. R asks if you would like to update other packages that the package you are installing needs to work. You might see the question Update all/some/none? [a/s/n]:. In that case type a and press enter, and R will update these packages too.

  2. R asks if you Do you want to install from sources the packages which need compilation? (Yes/no/cancel) or similar. In this case, type Yes or y or whatever the suggested option is for yes. However if this doesn’t work and the installation gives you an error, try installing again and this time choose the option for no. Some packages will need to be compiled from source. If you’re using a Mac you may also need to install clang from this page https://cran.r-project.org/bin/macosx/tools/ before this will work. This can be tricky on a Mac so you may need to use Google to find out how to fix things if this will nto work for you.

  3. An annoying error you may encounter (generally on a Mac), is that you keep seeing error messages when trying to install some packages, even after trying the solutions above. This might be because you’re installing via the RStudio default server for packages. To fix this, change your default by going to RStudio > Preferences > Packages. You should see a box with CRAN mirror above it. Change this from Global (CDN) RStudio to your closest mirror on the list and click Apply. This should fix the problem and only needs to be done once. Note that this issue may be fixed soon by RStudio, but has been an issue in the current version.

  4. If you see non-zero exit status and the name of a package you weren’t trying to install, it just means that this package is needed for the package you are trying to install to work. In these cases, try installing that package, then if that works try again with the package you wanted to install.

  5. If it still won’t work, it’s also worth trying to install packages both in RStudio, and just in the normal version of R you downloaded. For this there is no Packages tab so you need to use install.packages.

If you get other kinds of errors don’t panic. Check you are connected to the internet. Read the error message carefully to see if it gives you any clues. If that fails, try pasting the error message into Google and seeing if anyone else has had the same error.

1.5.1 Loading packages into R

You’ve installed the packages but they don’t automatically get loaded into your R session. Instead you need to tell R to load them every time you start a new R session and want to use functions from these packages. To load the package ape into your current R session:

library(ape)

You can think of install.packages like installing an app from the App Store on your smart phone - you only do this once - and library as being like pushing the app button on your phone - you do this every time you want to use the app (credit to Dylan Childs for this great analogy!).

For many packages, you won’t see any message at all when you run library(package_name). That means the code has worked. Some packages will print a message explaining some feature of the package, or telling you that it is also loading some other package, or highlighting that the package was written for a newer version of R. As long as you don’t see an error message, everything is fine.

If you run library(package_name) and you get the error message Error in library(package_name) : there is no package called ’package_name’package not found, this means you have not installed the package properly. Often just going back and trying to install it again will work. Or check out the list of possible solutions above.

1.5.2 ggtree, a special case for package installation

Packages are generally installed from CRAN, the same place you downloaded R from. But some communities share their packages in different ways. ggtree is one such package. To install ggtree we need to use Bioconductor instead as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ggtree")

You still need to load it using library as usual. Note that ggtree will produce a long message when you first load it telling you how to cite the package, and some functions it contains with the same names as functions in other packages (I’ve suppressed this here).

library(ggtree)

1.6 Loading and viewing your data in R

R can read files in lots of formats, but for simplicity we’ll use comma-delimited or .csv files in these exercises. Excel (and many other applications) can output files in this format (it’s an option in the Save As dialogue box under the File menu).

As an example, here is how you would read in a comma-delimited text file called Primatedata.csv using the readr package:

library(readr)
primatedata <- read_csv("Primatedata.csv")

This is a good point to note that unless you tell R you want to do something, it won’t do it automatically. So here if you successfully entered the data, R won’t give you any indication that it worked. Instead you need to specifically ask R to look at the data.

We can look at the data by typing:

str(primatedata)

Or if we are using the tidyverse packages

library(tidyverse)
glimpse(primatedata)

I’ve suppressed the output here to keep the document smaller. Always look at your data before beginning any analysis to check it read in correctly. I will cover this in more detail in later sections.

1.7 Errors and warnings

Errors are where the code fails to run and you see an error message prefaced in R by Error:. Often these mistakes are caused by typos, missing commas, or missing parentheses. R is very pedantic so this happens a lot! But they can also be caused by applying functions incorrectly or to the wrong kinds of data. Error messages are sometimes helpful, but often pretty baffling especially for beginners.

If you get an error message don’t panic. Read the message and see if you can work out what the issue is. Check your code carefully for typos. Make sure you’re using the correct data and variable names etc. If you can’t work out what the error means, try Googling it (remove any words specific to your data from it first). Google may be able to tell you what the issue is. If that still doesn’t help, try asking colleagues or post a question online.

The important thing about errors is that the code does not run. Warnings, on the other hand, appear when the code runs fine, but R wants to alert us to something, for example that the code might not be doing what you intended, or R has recognised that something you asked it to do is a bit risky. Do not ignore warning messages!. Always read the warning message and try to understand what it means. You may figure out that you can safely ignore the warning (I will try and note these situations in the exercises here). Otherwise you probably need to try and fix it.

1.7.1 Removing the +

Some R functions are pretty long, meaning that they need to appear on more than one line in the Console. In a long function, you therefore see a + in front of all lines except the first one. This tells you that R is running the code together.

Unfortunately, if you accidentally fail to finish a line of code (for example by not closing ” or parentheses), R will assume you haven’t finished and will add a + on the next line. If that wasn’t your intention, and you try to run another bit of code on the next line, then that next line of code won’t work leading to an error.

Don’t worry, this happens all the time! If you’re getting error messages and can’t see why, just check that the Console is showing the prompt > not the + at the start of your line of code. If you see the +, just hit the Escape key to get back to the prompt > and then go back to your code.

1.8 Summary

This covers the main introductory elements of R and means you should be able to run the code in the exercises.