Chapter 9 Macroevolutionary models for discrete traits

The aims of this exercise are to learn how to use R to fit macroevolutionary models in R to discrete traits.

We will be using the evolution of head shape in natricine snakes as an example. The data and modified tree come from V. Deepak, Gower, and Cooper (2023), and the tree comes from V. Deepak et al. (2021). I’ve removed a few species and a few variables to make things a bit more straightforward. If you want to see the full results check out V. Deepak, Gower, and Cooper (2023)!

Before you start

  • Open the 09-ModelsDiscrete.RProj file in the 09-ModelsDiscrete folder to open your R Project for this exercise.

You will also need to install the following packages:

  • tidyverse - for reading, manipulating and plotting data
  • ape - functions for reading, plotting and manipulating phylogenies
  • geiger - to fit models of evolution
  • phytools - for plots of transition rates

9.1 Preparation

To begin we need to load the packages for this practical.

# Load the packages
library(tidyverse)
library(ape)
library(geiger)
library(phytools)

Next we need to prepare the tree and data for the analyses. In the 04-Preparation exercise we read in our tree and data, checked them, and matched them so only species in both were retained. Please refer to that exercise for more details on how and why we do these things, or run through it now if you haven’t previously.

It is important to do these things before beginning a phylogenetic comparative analysis, so let’s run through that code again here.

# Read in the data
snakedata <- read_csv("data/snake-heads.csv")
# Check everything loaded corrected
glimpse(snakedata)
## Rows: 206
## Columns: 9
## $ Species          <chr> "Adelophis_foxi", "Afronatrix_anoscopus_2", "Amphiesma_beddomei", "…
## $ Ecomorph         <chr> "Burrowing", "Aquatic", "Terrestrial", "Terrestrial", "Terrestrial"…
## $ Diet             <chr> "annelids", "aquatic generalist", "anurans", "anurans", "generalist…
## $ ReproductiveMode <chr> "Viviparous", "Oviparous", "Oviparous", "Oviparous", "Oviparous", "…
## $ HeadLength       <dbl> 12.560, 25.100, 15.060, 20.105, 19.950, 18.810, 16.375, 22.640, 17.…
## $ HeadWidth        <dbl> 6.60, 15.80, 7.82, 9.13, 10.32, 9.82, 8.39, 11.30, 9.04, 9.55, 4.49…
## $ HeadHeight       <dbl> 5.43, 10.30, 5.40, 6.61, 7.56, 5.80, 6.40, 8.70, 5.68, 6.64, 4.19, …
## $ EyeDistance      <dbl> 4.01, 7.40, 4.85, 5.85, 5.84, 5.42, 6.00, 6.05, 5.01, 6.39, 3.12, 3…
## $ TotalLength      <dbl> 105, 194, 106, 185, 165, 155, NA, 163, 130, 309, 31, 62, 56, 35, 16…

To load the tree we will use read.nexus.

# Read in the tree
snaketree <- read.nexus("data/snake-tree.nex")
# Check it loaded correctly
str(snaketree)
## List of 4
##  $ edge       : int [1:496, 1:2] 250 251 252 253 254 255 256 257 258 259 ...
##  $ edge.length: num [1:496] 4.695 2.225 0.587 0.307 0.832 ...
##  $ Nnode      : int 248
##  $ tip.label  : chr [1:249] "Rhabdophis_leonardi" "Rhabdophis_nuchalis_1" "Rhabdophis_nuchalis_2" "Rhabdophis_adleri_1" ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

Remember to check the tree is dichotomous, i.e. has no polytomies, rooted, and ultrametric.

# Check whether the tree is binary
# We want this to be TRUE
is.binary(snaketree)
## [1] TRUE
# Check whether the tree is rooted 
# We want this to be TRUE 
is.rooted(snaketree)
## [1] TRUE
# Check whether the tree is ultrametric
# We want this to be TRUE 
is.ultrametric(snaketree)
## [1] TRUE

Next check that the species names match up in the tree and the data. This should reveal any typos and/or taxonomic differences that need to be fixed before going any further. I haven’t printed check here as we already did this in the previous exercise and it’s a long list but we have already checked these in advance. Do not skip this step for your own analyses!

# Check whether the names match in the data and the tree
check <- name.check(phy = snaketree, data = snakedata, 
                    data.names = snakedata$Species)
# check

Finally remove species that are not in the tree and the data and ensure the data is a data frame.

# Remove species missing from the data
mytree <- drop.tip(snaketree, check$tree_not_data)

# Remove species missing from the tree
matches <- match(snakedata$Species, check$data_not_tree, nomatch = 0)
mydata <- subset(snakedata, matches == 0)

# Look at the tree summary
str(mytree)
## List of 4
##  $ edge       : int [1:322, 1:2] 163 164 165 166 167 168 169 170 171 172 ...
##  $ edge.length: num [1:322] 4.695 2.225 0.587 0.307 0.832 ...
##  $ Nnode      : int 161
##  $ tip.label  : chr [1:162] "Rhabdophis_leonardi" "Rhabdophis_nuchalis_3" "Rhabdophis_swinhonis_1" "Rhabdophis_nigrocinctus" ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"
# Look at the data
glimpse(mydata)
## Rows: 162
## Columns: 9
## $ Species          <chr> "Adelophis_foxi", "Afronatrix_anoscopus_2", "Amphiesma_beddomei", "…
## $ Ecomorph         <chr> "Burrowing", "Aquatic", "Terrestrial", "Terrestrial", "Terrestrial"…
## $ Diet             <chr> "annelids", "aquatic generalist", "anurans", "anurans", "generalist…
## $ ReproductiveMode <chr> "Viviparous", "Oviparous", "Oviparous", "Oviparous", "Oviparous", "…
## $ HeadLength       <dbl> 12.560, 25.100, 15.060, 20.105, 18.810, 16.375, 22.640, 19.760, 10.…
## $ HeadWidth        <dbl> 6.60, 15.80, 7.82, 9.13, 9.82, 8.39, 11.30, 9.55, 4.03, 4.90, 3.44,…
## $ HeadHeight       <dbl> 5.43, 10.30, 5.40, 6.61, 5.80, 6.40, 8.70, 6.64, 3.24, 4.41, 3.12, …
## $ EyeDistance      <dbl> 4.01, 7.40, 4.85, 5.85, 5.42, 6.00, 6.05, 6.39, 3.34, 2.27, 1.70, 3…
## $ TotalLength      <dbl> 105, 194, 106, 185, 155, NA, 163, 309, 62, 35, 16, 23, NA, 33, 81, …
# Convert to a dataframe
mydata <- as.data.frame(mydata)
# Check this is now a data frame
class(mydata)
## [1] "data.frame"

Overall we have 162 species in the data and the tree.

Now we’re ready to run our analyses!

9.2 Models of evolution for discrete traits

For fitting simple models of evolution to discrete data we will use the fitDiscrete function in the R package geiger. fitDiscrete is a likelihood based method, so the output will give the maximum likelihood (ML) estimates of the parameters.

As an example, let’s look at the evolution of ecomorphs in our snake data. Before we start let’s make sure we have an idea of what these data look like.

# How many species are in each category of ecomorph?
mydata %>%
  group_by(Ecomorph) %>%
  summarise(number = n())
## # A tibble: 5 × 2
##   Ecomorph          number
##   <chr>              <int>
## 1 Aquatic               43
## 2 Aquatic Burrowing     11
## 3 Burrowing             17
## 4 Semiaquatic           54
## 5 Terrestrial           37

We can also visualize these variables on our tree by plotting them with colours. We’ll then create a vector of five colours for the ecomorphs. These need to be in alphabetical order, so the colours should be in the order Aquatic, Aquatic burrower, Burrower, Semi-aquatic, Terrestrial.

# Set up list of colours
ecomorph.colors <- c("darkblue", "plum", "chocolate", "lightblue", "lightgreen")

We then need to make sure the variable Ecomorph is a factor rather than a character as R deals with these differently and we need it to be a factor for the next bit of code to work. We can do this using the function as.factor and we can change it in mydata usign the tidyverse function mutate.

# Make Ecomorph a factor
mydata <- 
  mydata %>%
  mutate(Ecomorph = as.factor(Ecomorph))

Finally we can plot the tree with coloured tip labels to match the ecomorphs. I’ve rearranged the colour order in the legend so it makes more sense biologically (i.e. putting the aquatic and semi-aquatic next to each other rather than ordering it alphabetically). To make things a bit easier to read I’ve left off the tip labels. I’ve done this using ape plotting functions but you can also do this using ggtree.

# Plot the tree with colours at the tips to represent ecomorph
plot(mytree, cex = 0.5, adj = c(0.2), type = "fan", 
     no.margin = TRUE, show.tip.label = FALSE)
tiplabels(pch = 16, col = ecomorph.colors[mydata$Ecomorph])
legend("bottomleft", pch = 15, bty = "n",
       legend = c("Aquatic", "Semi-aquatic", "Aquatic burrower", 
                  "Burrower", "Terrestrial"), 
       col = c("darkblue", "lightblue", "plum", 
               "chocolate", "lightgreen"))

Some ecomorphs are clustered, but others appear across the phylogeny (this is the result of some pretty cool convergent evolution - check out V. Deepak, Gower, and Cooper (2023) for more details).

9.2.1 Fitting the ER, SYM and ARD models of evolution using fitDiscrete

We’ll fit three commonly used evolutionary models to the data; the equal rates (ER) model, the symmetric rates (SYM) model and the all rates different (ARD) model.

As a quick reminder, in the ER model, transitions between any pair of states occur at the same rate and are therefore equally probable. This is also known as the Mk1 model because it as a Markov model (Mk) with only 1 parameter (k).

In the SYM model transitions between any pair of states occurs at the same rate regardless of direction, but that the rate of change differs among states. A symmetric model with only 2 states becomes an equal rates (Mk1) model.

Finally, in the ARD model all transitions between pairs of states occurs at different rates. It’s important to be wary of over-fitting with this model as it can result in a lot of rates when you have more than a few states.

For more details on these models please see the Primer.

We are going to use the package geiger to fit models in this exercise, so we need the additional preparation step we first met in chapter 05-PhyloSignal. We need to create an object in R that only contains the variable required, and the species names (so we can match it up to the tree).

We use the function pull to extract just the ecomorph values. We then name these values with the species names from mydata using the function names. Note that this requires the trait data is in the same order as the tree tip labels so we need to reorder the data first…

# Check first few tip labels and species
mytree$tip.label[1:5]
## [1] "Rhabdophis_leonardi"     "Rhabdophis_nuchalis_3"   "Rhabdophis_swinhonis_1" 
## [4] "Rhabdophis_nigrocinctus" "Rhabdophis_tigrinus"
mydata$Species[1:5]
## [1] "Adelophis_foxi"         "Afronatrix_anoscopus_2" "Amphiesma_beddomei"    
## [4] "Amphiesma_monticola"    "Amphiesma_stolatum_1"
# These are different so we reorder the data by the tips
mydata <- mydata[match(mytree$tip.label, mydata$Species), ]
# Check this now matches the tip label order
mydata$Species[1:5]
## [1] "Rhabdophis_leonardi"     "Rhabdophis_nuchalis_3"   "Rhabdophis_swinhonis_1" 
## [4] "Rhabdophis_nigrocinctus" "Rhabdophis_tigrinus"
# Create ecomorph containing just ecomorph values
ecomorph <- pull(mydata, Ecomorph)
# Make sure ecomorph is a character, not a factor
ecomorph <- as.character(ecomorph)
# Look at the first few rows
head(ecomorph)
## [1] "Semiaquatic" "Terrestrial" "Semiaquatic" "Terrestrial" "Terrestrial" "Terrestrial"
# Give log head length names = species names at the tips of the phylogeny
names(ecomorph) <- mydata$Species
# Look at the first few rows
head(ecomorph)
##           Rhabdophis_leonardi         Rhabdophis_nuchalis_3        Rhabdophis_swinhonis_1 
##                 "Semiaquatic"                 "Terrestrial"                 "Semiaquatic" 
##       Rhabdophis_nigrocinctus           Rhabdophis_tigrinus Rhabdophis_tigrinus_lateralis 
##                 "Terrestrial"                 "Terrestrial"                 "Terrestrial"

Now we have a list of values with associated species names.

To fit the ER model we can then use the code below.

# Fit the ER model
equal <- fitDiscrete(mytree, ecomorph, model = "ER")

Let’s look at the output for the equal rates model:

# Look at the output
equal
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                            Aquatic Aquatic Burrowing    Burrowing  Semiaquatic  Terrestrial
##     Aquatic           -0.034973712       0.008743428  0.008743428  0.008743428  0.008743428
##     Aquatic Burrowing  0.008743428      -0.034973712  0.008743428  0.008743428  0.008743428
##     Burrowing          0.008743428       0.008743428 -0.034973712  0.008743428  0.008743428
##     Semiaquatic        0.008743428       0.008743428  0.008743428 -0.034973712  0.008743428
##     Terrestrial        0.008743428       0.008743428  0.008743428  0.008743428 -0.034973712
## 
##  model summary:
##  log-likelihood = -176.839091
##  AIC = 355.678181
##  AICc = 355.703181
##  free parameters = 1
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  number of iterations with same best fit = 100
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

This looks very similar to the output from fitContinuous in the 08-ModelsContinuous exercise. We’ve got a model summary, with log likelihoods and AIC scores, convergence diagnostics and an object summary. The only major difference is the first part, which gives us a fitted transition matrix or \(Q\) matrix, rather than a summary of model parameters. In fact the \(Q\) matrix is the model parameters for these models as it contains all the rates of change between different states. This is an equal rates model, so the off-diagonal elements are all same and show that the rate of change from one state to any other state is 0.00874.

Why are the diagonal elements of the \(Q\) matrix -0.034973712? The rows of the \(Q\) matrix must sum to zero: -0.034973712 + (4 * 0.008743428) = 0.

To fit the SYM model we just switch the model name. Note that this might be a bit slow on your computer as we have quite a lot of different ecomorphs, so a lot of different rates to fit. A good point to grab a coffee/tea/biscuit!

# Fit the SYM model
sym <- fitDiscrete(mytree, ecomorph, model = "SYM")
# Look at the output
sym
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                             Aquatic Aquatic Burrowing     Burrowing   Semiaquatic
##     Aquatic           -3.410257e-02      6.787882e-03  8.536690e-03  1.877800e-02
##     Aquatic Burrowing  6.787882e-03     -6.787882e-03  1.163160e-19  3.309475e-18
##     Burrowing          8.536690e-03      1.163160e-19 -1.172452e-02  3.187830e-03
##     Semiaquatic        1.877800e-02      3.309475e-18  3.187830e-03 -6.519697e-02
##     Terrestrial        1.664242e-23      9.507585e-18  2.981126e-19  4.323114e-02
##                         Terrestrial
##     Aquatic            1.664242e-23
##     Aquatic Burrowing  9.507585e-18
##     Burrowing          2.981126e-19
##     Semiaquatic        4.323114e-02
##     Terrestrial       -4.323114e-02
## 
##  model summary:
##  log-likelihood = -148.281974
##  AIC = 316.563949
##  AICc = 318.020902
##  free parameters = 10
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  number of iterations with same best fit = 4
##  frequency of best fit = 0.04
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

Again the \(Q\) matrix is the key output here. You could just read the numbers, we can see this more clearly if we plot the \(Q\) matrix for the model:

# Plot Q matrix as network
plot.gfit(sym)

The transition rate between Semi-aquatic and Terrestrial is highest (0.043), followed by transitions between Semi-aquatic and Aquatic (0.019). Burrowing and Aquatic (0.009) and Aquatic Burrowing and Aquatic (0.007) have the next highest rates, followed by Burrowing and Semi-aquatic (0.003). All the other rates are < 0.001 so they appear as zeros on this plot

Finally let’s fit the ARD model. Again this may take some time! Go and grab a tea/coffee/biscuit or take a short break.

# Fit the ARD model
ard <- fitDiscrete(mytree, ecomorph, model = "ARD")
# Look at the output
ard
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                             Aquatic Aquatic Burrowing     Burrowing   Semiaquatic
##     Aquatic           -3.871262e-02      7.710796e-03  1.070672e-02  2.029510e-02
##     Aquatic Burrowing  3.070007e-16     -3.085515e-16  1.550706e-18  7.000260e-38
##     Burrowing          4.008351e-19      4.966449e-23 -4.116883e-19  4.081239e-23
##     Semiaquatic        1.592909e-02      3.156430e-19  2.496600e-03 -5.538242e-02
##     Terrestrial        2.472419e-19      3.097357e-21  1.042375e-20  2.673830e-03
##                         Terrestrial
##     Aquatic            4.967164e-21
##     Aquatic Burrowing  2.883901e-24
##     Burrowing          1.076274e-20
##     Semiaquatic        3.695673e-02
##     Terrestrial       -2.673830e-03
## 
##  model summary:
##  log-likelihood = -145.506018
##  AIC = 331.012035
##  AICc = 336.969482
##  free parameters = 20
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  number of iterations with same best fit = 1
##  frequency of best fit = 0.01
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

Again the \(Q\) matrix is the key output. We can plot the \(Q\) matrix for the model:

# Plot Q matrix as network
plot.gfit(ard)

This is quite complex because in the ARD model we have different rates for each type of state change. Here, for example, the bottom left hand arrows joining Semi-aquatic and Terrestrial show that rates of change from Semi-aquatic to Terrestrial (0.037) are higher than those from Terrestrial to Semi-aquatic (0.003). These plots can be a little easier to read if we hide all the transitions where rates are < 0.001 as follows.

# Plot Q matrix as network without zeros
plot.gfit(ard, show.zeros = FALSE)

For now these plots are pretty simple, but you can add some code to make the arrows different sizes etc. if you need to using the code here http://blog.phytools.org/2020/09/graphing-fitted-m-k-model-with.html. Also check out future versions of phytools which should have these functions added.

9.2.2 Comparing models of evolution using AIC

As we discussed in the continuous traits exercise in 08-ModelsContinuous, we often want to know which of the models fits our variable best. We can use fitDiscrete to fit the models we are interested in and then compare them using AIC. We can extract the AICs from the models we fitted above as follows:

equal$opt$aic
## [1] 355.6782
sym$opt$aic
## [1] 316.5639
ard$opt$aic
## [1] 331.012

The “best” model is the one with the smallest AIC. Here the best model of evolution appears to be the SYM model, whereby rates of transition are the same in both directions, but are different for the different pairs of ecomorphs.

Alternatively we can use \(\Delta\)AIC or AIC weights to compare our models using the following code and the geiger function aicw:

aic.scores <- setNames(c(equal$opt$aic, sym$opt$aic, ard$opt$aic), 
                       c("equal", "symmetric", "different"))
aicw(aic.scores)
##                fit    delta            w
## equal     355.6782 39.11423 3.207273e-09
## symmetric 316.5639  0.00000 9.992717e-01
## different 331.0120 14.44809 7.283186e-04

aicw outputs the AIC (fit), \(\Delta\)AIC (delta) and AIC weights (w) for each of the models we fitted. The best model is the model with \(\Delta\)AIC = 0 or with AICw closest to 1. Using \(\Delta\)AIC we can conclude that the SYM model is the best fit to the data.

9.3 Ancestral state estimations

I’m not a big fan of ancestral state estimation as you probably realised from the Primer! There are many reasons to be highly sceptical of ancestral state estimates and interpretations of macroevolutionary patterns and process that are based on them. However, if you want to know if evolutionary tempo or mode have varied over clade history based on the state of a discrete trait as we intend to do in the next chapter, you’ll need to do it so you can assign each node to a particular state.

We’ll use ape’s ace function here. There are other options out there, for example in the phytools package, but ace will work for our purposes.

To perform ancestral state estimation of ecomorphs for snakes we can use the ecomorph object we made earlier and used to fit the ER, SYM and ARD models. Recall that ecomorph is just the Ecomorph column from the snake heads dataset, in the same order as the tips in the tree, with names equal to the tip labels/Species names in the tree.

# Take a quick look at the first few rows of ecomorph
head(ecomorph)
##           Rhabdophis_leonardi         Rhabdophis_nuchalis_3        Rhabdophis_swinhonis_1 
##                 "Semiaquatic"                 "Terrestrial"                 "Semiaquatic" 
##       Rhabdophis_nigrocinctus           Rhabdophis_tigrinus Rhabdophis_tigrinus_lateralis 
##                 "Terrestrial"                 "Terrestrial"                 "Terrestrial"

To perform an ancestral state estimation of ecomorphs under the symmetric model (the best fitting model for the data as discovered above) we use this code:

# Ancestral state estimation of ecomorph under the SYM model
ancestral_ecomorphs <- ace(ecomorph, mytree, type = "discrete", model = "SYM")
## Warning in log(comp[-TIPS]): NaNs produced
## Warning in sqrt(diag(solve(h))): NaNs produced

You should notice a series of warning messages appear that say NaNs produced and NA/Inf replaced by maximum positive value. Don’t worry about this – it happens when rates for one transition are particularly low (and we know from above that many of the transition rates were < 0.001) but doesn’t really affect our node state estimates.

ace now defaults to a joint estimation procedure, where the ancestral states are optimized based on all information in the tree, not just the states at descendant tips. Older versions of ace did not do this, so be wary of this when reading older critiques of this method

We can look at the first few rows of our ancestral state estimates by typing:

head(ancestral_ecomorphs$lik.anc)

In this matrix the rows correspond to nodes in the tree (although the numbering is bit off; we don’t need to worry about this here but it will come up in the next exercise) and the five columns give the scaled likelihoods that the node is in each of the different ecomorphs.

The scaled likelihoods are like probabilities, so for the first node, the probability of it being Aquatic is highest (probability = 0.5417), followed by the probability of it being Semi-aquatic (0.3587), Terrestrial (0.0755), Burrowing (0.0206), or Aquatic Burrowing (0.0035). These scaled likelihoods can be visualized with pie charts on the tree we plotted earlier byt adding the nodelabels line of code below.

# Plot the tree with colours at the tips and nodes to represent ecomorph
plot(mytree, cex = 0.5, adj = c(0.2), type = "fan", 
     no.margin = TRUE, show.tip.label = FALSE)
tiplabels(pch = 16, col = ecomorph.colors[mydata$Ecomorph])
nodelabels(pie = ancestral_ecomorphs$lik.anc, piecol = ecomorph.colors, cex = 0.5)
legend("bottomleft", pch = 15, bty = "n",
       legend = c("Aquatic", "Semi-aquatic", "Aquatic burrower", 
                  "Burrower", "Terrestrial"), 
       col = c("darkblue", "lightblue", "plum", 
               "chocolate", "lightgreen"))

Aquatic is most commonly estimated as the ancestral ecomorph for natricine snakes, with Semi-aquatic as a close second, and with multiple transitions to other ecomorphs throughout the tree.

Ancestral state estimates (not reconstructions) are, at best, weighted averages of your trait based on branch lengths of the phylogeny and the model of evolution used. If your phylogeny or traits or model have error (and they always will) then your ancestral state estimates should be interpreted with care.

9.4 Summary

You should now know how to fit simple models of evolution to discrete trait data in R.

9.5 Practical exercises

In the data folder there is another tree (primate-tree.nex) and dataset (primate-data.csv) for investigating the evolution of primate life-history variables. These data come from the PanTHERIA database (Jones et al. 2009) and 10kTrees (Arnold, Matthews, and Nunn 2010).

Read in the tree and data, then prepare them for a PCM analysis (you may have already done this in a previous exercise which should save you some time). Fit equal rates, symmetric rates and all rates different models to the social status variable for Primates (SocialStatus). Note that social status has two states: non-social (1) or social (2).

Then answer the following questions.

  1. Which model fits best?

  2. What is the transition rate from non-social (1) to social (2), and vice versa, in the all rates different model? What does this mean biologically?

  3. Plot ancestral state estimates of social status (from the ARD model) on the phylogeny. Is the root node estimated as being non-social or social?

  4. Why didn’t I ask you to fit the symmetric model?

References

Arnold, Christian, Luke J Matthews, and Charles L Nunn. 2010. “The 10kTrees Website: A New Online Resource for Primate Phylogeny.” Evolutionary Anthropology: Issues, News, and Reviews 19 (3): 114–18.
Deepak, V, Natalie Cooper, Nikolay A Poyarkov, Fred Kraus, Gustavo Burin, Abhijit Das, Surya Narayanan, Jeffrey W Streicher, Sarah-Jane Smith, and David J Gower. 2021. Multilocus phylogeny, natural history traits and classification of natricine snakes (Serpentes: Natricinae).” Zoological Journal of the Linnean Society 195 (1): 279–98. https://doi.org/10.1093/zoolinnean/zlab099.
Deepak, V., D. J. Gower, and N. Cooper. 2023. “Diet and Habit Explain Head-Shape Convergences in Natricine Snakes.” Journal of Evolutionary Biology 36 (2): 399–411. https://doi.org/https://doi.org/10.1111/jeb.14139.
Jones, Kate E, Jon Bielby, Marcel Cardillo, Susanne A Fritz, Justin O’Dell, C David L Orme, Kamran Safi, et al. 2009. “PanTHERIA: A Species-Level Database of Life History, Ecology, and Geography of Extant and Recently Extinct Mammals: Ecological Archives E090-184.” Ecology 90 (9): 2648–48.