Academic life is particularly full of rank ordered lists, even if they are frequently not transparently available. From undergraduate examinations to professorial promotions, from REF (and in future TEF) marks to grant-awarding panels, the scores matter. Anyone who has ever been ‘scored’ will worry about the accuracy of the scores given; anyone who has been involved in decision-making will have their own views about the process, its validity and whether their own part left them satisfied. Peer review may be the best process we have for making these judgements – which in essence all of them rely on – but no one ever claimed peer review was faultless. If you have never sat on any comparable committee you may well be interested in, as well as deeply suspicious of, what actually goes on. If so, you may find illuminating this scholarly article from the social sciences. In this, the author gives much qualitative insight into the goings on in a series of Swedish Research Council meetings, as he explores a particular phenomenon known as the ‘anchoring effect’, on which more later.
In all of the committees I have been involved with I have only once sat on (and never had the misfortune to chair) a panel where I felt there was something slightly dodgy going on, in the sense there was a sub-group behaving as a cartel. I hasten to add this behaviour was spotted and neutralised by an oversight panel. In general people try really hard to be objective but, as the article demonstrates, this is not as easy as you might think. Consider the following issues as demonstrating the challenges that implicitly or explicitly may arise:
- If asked to score between 1 and 10 against some criterion, some people will use the full range, but others will probably cluster scores between 4 and 8 believing nothing is perfection and nothing is completely worthless. Averaging such scores to produce a crude rank-ordered list (even if subsequently modified by discussion, as such raw lists essentially always are) may not be the optimum way to proceed, but is likely to be what happens.
- In the case of a grant proposal, a very convincing case may be made which only the specialist is able to pick up contains a fundamentally flawed assumption; or equivalently in promotions, only the person closest to an application may spot that there is unjustified hyperbole in some of the claims. Rightly, these judgements should have more weight than those of a less expert panel member, but it will be random in each case whether such immediately relevant expertise is represented on the panel.
- Absolutely ‘solid’ metrics (e.g the h index) may be used improperly e.g. to compare candidates from very different disciplines. If you try to compare a pure mathematician (think Andrew Wiles of Fermat’s Last Theorem fame) with a synthetic chemist, their h indices may vary by a factor of 10. It says nothing about their relative excellence. That much is pretty obvious, but even if you compare a synthetic chemist with a physical chemist, the differences may be substantial. Sub-disciplines as well as larger groupings matter in these things. Similarly with prizes: focussing on the UK, the Royal Society of Chemistry just happens to have a much larger and more varied collection of prizes than the Institute of Physics so a solidly good but not-necessarily-stellar chemist is far more likely to be able to list a prize or two than a comparable physicist. You need to be very aware of these differences to be able to tension these solid facts appropriately.
- The committee procedures may significantly affect the way different panel members participate. I once sat on a research council panel which was dealing with four very different sub-fields. Initially the modus operandi was for each of the four to be taken in turn. This meant it was all too easy for panel members only to focus on the area they were closest to, essentially dozing off (or at least being very bored and not concentrating) during the rest of the presentations. As a result, when the final scores were decided most of the committee had little to say about most of the applications. During the time I served on the panel (and this must be at least 15 years ago) it became obvious just what a bad way of proceeding this was, and eventually meetings took place considering applications simply in alphabetical order. I am sure this led to better decisions as everyone concentrated throughout the discussions.
- Without needing to invoke either a conspiracy or genuine conflict of interest, if there is someone who has a prior high opinion of one particular applicant, this may shine through regardless of the case on the table. If this person happens to talk first and is (as a recent committee member described themselves to me) a dogmatic character, a strongly positive message can be conveyed which later speakers find hard or are unwilling to challenge. Randomness in order of speaking may have a significant effect on what is ultimately a collective decision. Chairs can do what they can to overcome dogmatic speakers, but are unlikely to know in advance how best to order speakers so that no unreasonable advantage can be accrued by any particular candidate.
The issue of ‘anchoring’ I referred to at the beginning relates most closely to this last point of a preliminary score influencing later results. First identified I believe by Daniel Kahneman, it is the phenomenon by which the introduction of an initial figure may have subsequent impact on how people score/react or choose to proceed. Given some figure – it could be for scoring a grant or equally for what they are prepared to pay for some product, which was the context Kahneman considered – people use that as a baseline and tweak what they believe is appropriate around it rather than starting afresh themselves with an objective view. So, in the context of scoring a collection of grants, if the scores submitted in advance by panel members are averaged and presented to the panel before detailed discussion starts, it might influence how the subsequent debate unwinds and hence the final scores which are awarded.
This is the situation which forms the basis of the paper I referred to above by Swedish researcher Lambros Roumbanis as he analyses panel meetings of the Swedish Research Council. But his paper describes a much broader range of behaviours than just this particular facet, which is why it is so generally informative for those curious about what goes on in such meetings. Of course every panel is different and so the observations must be treated as examplars rather than necessarily typical. In my experience people are probably less reflective in their lunch breaks than he apparently discovered, probably because his very presence influenced behaviour. Nevertheless people do agonise over their actions – committee members are not, in my experience, blasé or careless. That does not stop them having internal biases, prejudices and baggage from previous meetings, all of which may impact on how they interact with other panel members and the paperwork in front of them. However, let me stress, few if any panel members approach the task with anything but the best of intentions; nor do they tend to set out to game the system for some nefarious purpose. Gross biases tend to be picked up and challenged. Despite all that there is absolutely no doubt that peer review does not always end up with the right answers, be it down to anchoring, ignorance or incompetence. Alternative methodologies are not likely to be any better. Lottery anyone?