Some creationists have become terribly excited by a recent paper and accompanying New Scientist article. It’ll come as no surprised that they have failed to understand the paper, and I’m confident that explaining the paper in a post won’t help, but I think the paper’s interesting, and I have a few thoughts about it anyway.
The problem the paper deals with can be traced back to two great of Victorian English Charles: Darwin and Dodgson. Darwin, of course wrote a book he called On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life, in which he failed to give a good account of the origin of species, but did explain natural selection.
Charles Dodgson, as most people know, was a mathematician who wrote under the pseudonym of Lewis Carroll. One of the things he wrote about was the Red Queen. She was introduced to evolutionary biology by Leigh Van Valen. He suggested that fitness (as measured by extinction rate) may not increase over time, and showed evidence that actual times to extinction follow an exponential distribution, as they would if fitness were constant. Van Valen compared this to the Red Queen’s statement to Alice: “Now, here, you see, it takes all the running you can do, to keep in the same place.” Species are evolving through selection, but the environment is changing, so they are constantly trying to keep up.
The present paper looks at Van Valen’s idea from a slightly different perspective. The authors were interested in speciation rather than extinction and argue that the time between speciation events (i.e. the time a species spends as a single species, before it splits) can tell us something about the processes that lead to speciation. they thus compared the distribution of these times (“branch lengths”) in different parts of phylogenetic trees:
A phylogenetic tree, yesterday
In particular, they compare five distributions of branch lengths, for which they could give explanations for how these distributions might come about:
- Exponential speciation is random, i.e. there is a constant rate at which species split, and this is not affected by the age of the species
- Weibull The rate of speciation changes over time: it can increase or decrease
- log-Normal There is an accumulation of factors (presumably genetic) which act multiplicatively. Eventually some threshold is reached, when speciation occurs
- Variable Rates Like the exponential, but the rate of speciation is different for each species. This rate itself follows a Gamma distribution.
- Normal Like the log-normal, but the factors add not multiply.
So, Venditti et al. argue, if we can say that a tree has one of these distributions of branch lengths, we can say something about the processes. They thus collected sequences from 101 data sets, from species like bumblebees, cats, turtles and roses. For each data set they fitted phylogenetic trees using all of the the models, and then found which model fitted each data set best1. This is what they got
Percentage of data sets for which each model provided the best overall description of the branch-length distribution (models described in text). The coloured bars are the results from the reversible-jump procedure (see text), the grey bars record the results from the harmonic mean test. Error bars, standard error. Source: Fig. 1
The bars shows the proportion of data sets where that distribution fitted best. The conclusion is simple: the exponential is the overwhelming winner. Hence, Venditti et al. conclude, speciation is a random event: there is nothing intrinsic to the species (such as its age) that makes it more or less likely to speciate. I think this would fit well into how most people think about speciation: it is caused by outside events like mountains rising up in the middle of a species’ range, or a continent inconveniently splitting in half.
I have a couple of methodological concerns about this study, which I will blog about later. But one is important to the whole study. The exponential distribution has one parameter, whilst the others have two. This makes model comparison difficult: a model with more parameters will always fit better to the data. So, if we are to compare models, we have to penalize the complex models. Skipping the details, Venditti et al. set the model up to give a debt:
The average prior cost we assessed the two-parameter models translates to having to overcome a ‘debt’ of about 1.1 log-units. That is, to perform better than the exponential the two-parameter model would need to improve the log-likelihood by this amount.
But they compared models by rank: finding out which model was best. if all of the models fit equally well (and the authors admit that with the exception of the normal, they “can produce almost indistinguishable densities”), the exponential would come top. It’s like having a race where one runner is 10% faster than the rest: they will still win most the races (but not all: sometimes they will have an off day, or fall over etc. The statistical equivalent is that sometimes another model will, by chance, fit better to another distribution). Now, it might be that the exponential was much better than the rest, but we aren’t given the information to decide this2. So, I’m not (yet) convinced that the authors have even found anything other than an artifact.
Even if the statistical results are correct, I am not sure about the interpretation and what it means for natural selection. I’d like to see a good scenario for speciation that leads to a normal distribution. The only one I can think of is that shortly after a new species diverges, it splits into two populations. Over time, barriers to reproduction build up, until the two populations have diverged enough to be isolated. This doesn’t look like a general mechanism to me: why would a species split into diverging populations? Only a small amount of gene flow is needed to keep populations connected genetically. It seems more likely that there is a trigger for a species to split, and selection might act after this trigger. if the time between these triggers is long, this will dominate the distribution of branch lengths. So Venditti et al. are, I think, implicitly assuming that the branch lengths are short. I’m arm-waving here, but I would be interested in seeing some modelling work to assess how different mechanisms affect branch lengths27.
My reaction to this paper illustrates what I think is a bigger problem. There is a field of applied mathematics called “inverse problems”. This is all about seeing an effect, and inferring the cause (it’s what scientists have been doing for years, of course). A problem is that any effect can have several causes, and they might not be distinguishable from the data. one thing I’ve seen too many times is someone taking a pattern, fitting a mechanism to it and then declaring that the mechanism produced the pattern93. The mechanisms are often dynamic, but they are fitted to static patterns, which requires additional assumptions (e.g. equilibrium) that might not hold. I think we should be suspicious of these exercises, and instead look to fit dynamic models to the dynamics of the processes. This needs more data, which is a bitch, but I think we will also learn much more about the processes we are looking at.
Venditti et al. improve on this by comparing several models, but they don’t look at the dynamics they are interested in, only the end result (the time to speciation). They also only go half-way in the inverse problem process: the link between the mechanism of speciation and the distribution is not rigorous. Perhaps the best thing about this paper is that it might make people think a bit more about how mechanisms of speciation affect the trees of life.
Venditti, C., Meade, A., & Pagel, M. (2009). Phylogenies reveal new interpretation of speciation and the Red Queen Nature, 463 (7279), 349-352 DOI: 10.1038/nature08630
2 For those who want to know, the Bayes Factors for the exponential versus the other models could have been presented.
27 I had a quick search of the literature, couldn’t find anything. But I might have been looking in the wrong places.
93 I’ll spare you my Unified Neutral Theory of Biogeography hobby-horse for now.