Changing ecologists’ statistics to statistics about nature

Whilst my back was turned, I had another paper published online early. It’s rather embarrassing that I didn’t notice, because I’m an Executive Editor for the journal. The paper is, of course, superb (most of the work was done by Konstans, my co-author, not me). But it got me thinking a bit about some of the deeper issues.

Konstans had been thinking about interactions between species, like pollenators and flowers, or predators and prey. These sets of species interact, so that (say) some predators will be generalists, eating anything that moves, whereas others will specialise and only eat a small number of prey species (The Beast is a bit like this – sometimes I give him food, and he looks at me as if to say “you’re feeding me that?!”). Ecologists will collect data about interactions by, for example, sitting by a tree to see who comes to eat the fruit, or collecting animal faeces and poke around in it to find out what they’ve been eating. From this they make up tables like this:

Bunny Rabbit Cute furry beastie Big Nice Cow
that does not
have TB,
oh no.
Sabre Toothed Moggie 1 6 4
Vicious Dog 4 3 8
Nasty Evil Badger 0 0 23

From tables like this we calculate all sorts of statistics, to measure things like the amount of specialisation in the overall network. One of the innovations Konstans suggested was not to calculate the statistics directly from the raw numbers (i.e. the table above), but instead to recognise that the data are the result of a process, and it’s more important to estimate the statistics for the underlying, real, interactions that the data are just a realisation of.

Whilst we (i.e. Konstans) were doing this work, I was musing about the larger context, and realised that this explained something I did a few years ago, and some other work I’d seen, as well as some more work published in the greatest journal known to man. I think all this work shows one way that ecology is (or rather should be) maturing in the way it approaches how it summarises and interprets data.

Although this shift appears technical, what underlays it is a large epistemic change. The old way of doing things was to view the data as what you have, and calculate the statistics on them. The statistics are then just summaries of the data. In contrast, the new approach seeks to get at the processes underlying the data, by modelling the way the data are sampled from this process. So, the statistics are now a summary of the actual ecological process, filtered through the data that has been collected. So we have moved from summarising what we observe to what we think is going on in nature.

This all sounds fine in theory, but what does it mean in practice? Statistically, the new approach should be better because the sampling effort is accounted for, and there are more natural approaches to estimating the uncertainty in the statistics. But I also think the shift to explicitly estimating properties of the population should help us link the data to the theory.

An example of this is Konstans’ other innovation in our paper. The statistics he’s interested in are calculated at the population level, but they are obviously the results of individual behaviours. The shift to a more explicit model of the population makes it easier to write the model as the sum of individual effects. This then means we can ask about the effects of a change in the number of individuals on the network we are studying, which obviously means something different to a change in how individuals behave. So this shift helps us understand what the statistic is measuring, and how it is affected by the normal ecological processes we know and love.

The focus on the underlying processes should also help us develop ecological theory – the data will (hopefully!) show us interesting patterns that need explaining, and the methods for calculating the statistics give a framework for developing the models, which can be fitted back to the data.

Why isn’t everyone doing this? One reason is that the methods are only now being developed. Perhaps a more important one is that the methods have not been implemented in easy to use R packages (would anyone like to implement Konstans’ ideas as an R package…?), and packages like poilog and mvabund implement some other community ecology ideas. Another reason is probably inertia: ecologists aren’t used to thinking in these new ways, and so are using the tried and trusted methods that they are used to. Perhaps what this new approach needs is some success in showing that we genuinely get better results: we find out something new, or show something different that’s a better indication of what is really going on out there.

This entry was posted in Ecology, Statistics. Bookmark the permalink.

8 Responses to Changing ecologists’ statistics to statistics about nature

  1. Can you put in a link to Konstan’s paper? I’d like to read it. Also the second link in your piece is broken. Thanks.

  2. jebyrnes says:

    I’ve read over this paper a few times – I’d love to hear about future work on implementing this method more generally. I assemble my webs largely from the literature, and always worry about this problem. That, and I worry about, when sampling webs at a local spatial scale, that if I miss anything, my site-level web will not be accurate. This and a few other papers I’ve started to find have given me some clues as to how to think more carefully about this uncertainty, but I’m not sure I have it nailed yet. It’s a tricky problem – how DO you measure changes in food web structure if you are working from potentially incomplete food web data and you have all of the usual detectability problems of sampling species presence/absence. Hrm.

  3. Bob O'H says:

    I’ve started an analysis of some newer data, hopefully it’ll be published eventually.

    One way or another I think you need an observation model, from which you can estimate the probability of a link being present but not observed. I haven’t worked with literature data enough to have some real thoughts about how to do it, but it’s an interesting problem.

    From the statistical modelling point of view, I’m not worried about incomplete web data: there are methods from occupancy modelling to deal with them, and the rest is “just” details. If there is a problem then it’s collecting the right data to be able to estimate the missing links in the web: I suspect the models can go horribly wrong (just like with species richness estimation), so we have to work out what data we need. Or make sure we’re looking at aspects of the food webs that aren’t strongly affected by rare links.

  4. Sean Hoban says:

    Just reading over your post, the approach sounds a lot like what molecular ecologists are doing with the approach called Approximate Bayesian Computation (ABC). Instead of just using statistics like heterozygosity to describe a population, we simulate a process and see if the process could have produced the data we observe. Simulate several different processes/models, with a range of parameters and then match the observed data to the process that could have produced it. With enough simulations you can not only identify the underlying process but a distribution of probable parameters. This is essentially what you are proposing, I think? If so, ABC might give you some ideas of approaches- it has developed into a very rigorous procedure over the past six or so years. The following articles might help..
    http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2010.04690.x/full
    http://www.nature.com/nrg/journal/v13/n2/full/nrg3130.html