Whilst my back was turned, I had another paper published online early. It’s rather embarrassing that I didn’t notice, because I’m an Executive Editor for the journal. The paper is, of course, superb (most of the work was done by Konstans, my co-author, not me). But it got me thinking a bit about some of the deeper issues.
Konstans had been thinking about interactions between species, like pollenators and flowers, or predators and prey. These sets of species interact, so that (say) some predators will be generalists, eating anything that moves, whereas others will specialise and only eat a small number of prey species (The Beast is a bit like this – sometimes I give him food, and he looks at me as if to say “you’re feeding me that?!”). Ecologists will collect data about interactions by, for example, sitting by a tree to see who comes to eat the fruit, or collecting animal faeces and poke around in it to find out what they’ve been eating. From this they make up tables like this:
|Bunny Rabbit||Cute furry beastie||Big Nice Cow
that does not
|Sabre Toothed Moggie||1||6||4|
|Nasty Evil Badger||0||0||23|
From tables like this we calculate all sorts of statistics, to measure things like the amount of specialisation in the overall network. One of the innovations Konstans suggested was not to calculate the statistics directly from the raw numbers (i.e. the table above), but instead to recognise that the data are the result of a process, and it’s more important to estimate the statistics for the underlying, real, interactions that the data are just a realisation of.
Whilst we (i.e. Konstans) were doing this work, I was musing about the larger context, and realised that this explained something I did a few years ago, and some other work I’d seen, as well as some more work published in the greatest journal known to man. I think all this work shows one way that ecology is (or rather should be) maturing in the way it approaches how it summarises and interprets data.
Although this shift appears technical, what underlays it is a large epistemic change. The old way of doing things was to view the data as what you have, and calculate the statistics on them. The statistics are then just summaries of the data. In contrast, the new approach seeks to get at the processes underlying the data, by modelling the way the data are sampled from this process. So, the statistics are now a summary of the actual ecological process, filtered through the data that has been collected. So we have moved from summarising what we observe to what we think is going on in nature.
This all sounds fine in theory, but what does it mean in practice? Statistically, the new approach should be better because the sampling effort is accounted for, and there are more natural approaches to estimating the uncertainty in the statistics. But I also think the shift to explicitly estimating properties of the population should help us link the data to the theory.
An example of this is Konstans’ other innovation in our paper. The statistics he’s interested in are calculated at the population level, but they are obviously the results of individual behaviours. The shift to a more explicit model of the population makes it easier to write the model as the sum of individual effects. This then means we can ask about the effects of a change in the number of individuals on the network we are studying, which obviously means something different to a change in how individuals behave. So this shift helps us understand what the statistic is measuring, and how it is affected by the normal ecological processes we know and love.
The focus on the underlying processes should also help us develop ecological theory – the data will (hopefully!) show us interesting patterns that need explaining, and the methods for calculating the statistics give a framework for developing the models, which can be fitted back to the data.
Why isn’t everyone doing this? One reason is that the methods are only now being developed. Perhaps a more important one is that the methods have not been implemented in easy to use R packages (would anyone like to implement Konstans’ ideas as an R package…?), and packages like poilog and mvabund implement some other community ecology ideas. Another reason is probably inertia: ecologists aren’t used to thinking in these new ways, and so are using the tried and trusted methods that they are used to. Perhaps what this new approach needs is some success in showing that we genuinely get better results: we find out something new, or show something different that’s a better indication of what is really going on out there.