Some Scores are More Equal than Others

Having committed to sit on an EPSRC panel for the first time in many years, I had my sights set on a post about the Shared Services Centre of the Research Councils. Clearly this is a popular topic for academics to target since I was pipped to the post; just a few days ago Telescoper vented his bile on them for their inefficiencies over his P60 forms at the end of the tax year. I have a different gripe to make. As long term readers will know, for some time I chaired what was uninspiringly known as Committee C of the BBSRC. I stepped down after a reorganisation last summer of the committee remit, taking it into fields about which I knew nothing – on the whole I believe a little relevant expertise is desirable in a chair, although not everyone may agree. Of late this has meant a diminution of posts on this blog about committee work and impact.

I thought that that BBSRC experience meant I had a fair idea of what to expect from the SSC, having seen BBSRC grants move from their own internal processes to use of this central facility. The transition had not been without hiccoughs, and the ‘paperwork’ we received for the first meeting after the switch was in a much less satisfactory state than that we had been sent for previous meetings, but we did manage to find our way around it. The ‘paperwork’ – both before and after the transition – was in fact on a carefully indexed USB stick, the weight of material we had to take to the meeting correspondingly light. The later problems came from the rather circuitous way things like referees’ reports were woven into the structure, making the whole thing harder, but not impossible, to navigate.

I cannot help comparing this with what was sent to me by the SSC for the EPSRC panel. In this case the paperwork came with no inverted commas around it; it was literally a mound of papers, loosely tied together with those awful green ties reminiscent of exam scripts. Furthermore, the information about procedures, scoring guidelines, details of where I was staying, the codes for referees etc were enclosed in a loose bundle which, by the time I’d dropped the pack on the floor a few times, were hopelessly muddled up. Why? Why do I have to strain my back lugging this stuff around? The thickness of the wodge of papers was such it has destroyed the spring on the carrier at the back of my bicycle (should I add the cost of a new one to my expenses claim?). Why are they determined to destroy my new year’s resolution of struggling towards a paperless life? I am baffled by the fact that the SSC – let me remind you it stands for Shared Services Centre – actually operates completely different systems for different research councils.

I cannot believe this leads to ‘efficiency savings’, as no doubt ministers were promised in some delivery plan. As has been pointed out, like many other initiatives designed to save money, in fact it has done anything but. When I pressed the Secretariat, it transpired I could have asked for stuff to be sent on a CD, but as I don’t have an internal CD drive on my laptop I’m not sure that would really have helped. I certainly wasn’t asked which I preferred in advance, which perhaps would have been the ideal solution. Perhaps the majority of EPSRC relevant people really do still want ‘paperwork’ in its genuinely paper incarnation, but I find that slightly surprising. I did manage to get the forms on which to write my comments to be sent electronically, so that I could type up my responses – but again that required a special request (quickly acceded to, I should add).

There are other things that have been revealed to me by sitting on an EPSRC panel again after a long gap. It shows once again how tribal we are, how different (sub)-disciplines hold variable views about ‘normal’ behaviour, and how much this may colour our funding landscape. The EPSRC wish to ‘shape capabilities’ but are also keen to stress (as stated in a letter from their CEO David Delpy and Chairman John Arnitt to Nature) that

‘Research excellence remains pre-eminent and the Council will continue to support applications that are deemed excellent by peer review.’

Hence, those disciplines which are more collegial – or perhaps less critical – will continue to thrive over those where internecine warfare between camps or simply a more judgemental attitude to filling in referee forms and scores holds sway.

This is a worry. Without wishing to pick out any particular area over another (apart from anything else the sample size is too small to do that safely, whatever my suspicions may be), I note that a number of years ago when I sat on the now defunct EPSRC Physics SAT (Strategic Advisory Team), sufficient unease was manifest about scoring differentials between communities that the EPSRC carried out its own internal audit. This confirmed concerns that referees from one particular sub-discipline were inclined to give the top score willy-nilly, whereas other fields were more severe in their scoring and inclined to use the full range available. That particular sub-field was rapped over the knuckles and, I presume, cleaned up its act at the time. I can’t help questioning whether similar differences in behaviour are not still prevalent. This, in times of cash shortages and sculpting of the portfolio, means that without some norm-referencing of scores coupled with a scrutiny about tone of comment, those areas where referees are tough – in word or number – will be at an even more severe disadvantage.

The trouble is that unless this matter is regularly revisited to check for discrepancies, this has all the potential to become a self-reinforcing scenario. Remember that EPSRC rules (unlike BBSRC) mean that the panel can only evaluate grants based on the referees’ comments/scores. They are not allowed to put in their own views where they differ from what may be written on the referee forms. Again, this is different from BBSRC practice where the panel may choose to feed in their own specialist knowledge (although an EPSRC panel can choose essentially to reject a referee’s submission if there seems some clear reason why they should, for instance where there seems evidence that the referee hasn’t read the proposal carefully enough or might have a conflict of interest which hadn’t been initially spotted). What worries me is the tendency for referees to award the highest score, not always backed up by the tone of their comments. I wish the community would recognize how unhelpful this is.

The weird thing is, to revert to my previous experiences with BBSRC, I never saw this behaviour there. Comments across the board tended to be more critical, and more differentiated in tone. (My gripe with BBSRC refereeing would be different: that community seems to be in thrall to the need for a hypothesis or two to act as the focus for a proposal.  No hypothesis generally means no funding, although it is a hopelessly crude discriminator in my view.) I am not sure if the differences are endemic to physicists, or reflect the fear that EPSRC referees subconsciously feel that they should ‘Judge not that ye be not judged’. However, I think this is a potentially damaging situation that should be carefully monitored. Maybe when refereeing I too should join the club of simply awarding the top score, along with bland comments that provide the panel with little assistance.

The EPSRC obviously tries to overcome disciplinary differences by having a group of ‘rovers’ who visit all the panels taking place on a given day, with the aim of tensioning between disciplines so that rank-ordered lists can be appropriately interwoven. But I don’t believe that removes the problem of some areas having a tradition of making detailed critical comments, and others tending to say – even if at some length – something that amounts to the team is good, the science is interesting, this should be funded.

I’d be interested to know if readers who have served on panels of different sorts and in different parts of the Research Council family share similar anxieties.

This entry was posted in Science Culture, Science Funding and tagged , , , . Bookmark the permalink.

8 Responses to Some Scores are More Equal than Others

  1. Gabriel says:

    Thanks, Athene. It’s so refreshing to read your blogs. I attend talks and meetings on these topics feeling we’re being spoken at, feeling “what’s the point?” when the scenario seems to learn to comply with the status quo rather than talking about how to shape as a community the practices we would like to see. It’s refreshing to see someone at the top of their career sharing similar views, and even uncertainties. If at all, for personal sanity!

  2. Heather says:

    Very insightful look behind those scenes. Doesn’t it feel like we are only playing at knowing what we’re (collectively) doing? I also can feel for the prejudice against non hypothesis-driven research, which discounts things like natural history, paleontology and nosology and is really simply another manifestation of the same kind of sectarian cultural bias. I have been at the brunt end of that observation.

  3. Mark Claydon-Smith says:

    Interesting set of observations.

    Referee cliques are a problem – but it is more because it leads to bad feeling, and disciplinary tribalism. It is just one manifestation of “self similarity bias”, which is probably the most intractable and pernicious element within peer review. The debate this week about female members on company boards comes from the same source – ie tendency of groups with same set of cultural values to reinforce and support each others world-view.

    About 8 years ago I did a large scale quantitative analysis of refereeing conformance (ie tendency of reviewers to give same or different scores as other reviewers). This showed that reviewers from the same department discipline would tend to agree around 73% of ocasions (77% for physics) whereas reviewers from different disciplines agree only 38% of time. The reasons are farily obvious – but this does have real implications for assessment of interdisciplinary research.

    The difference between referee scores and narrative is pandemic – despite several form changes specifically to address this. Grade inflation also seems as prevalent in higher education as in every other branch of education – it is always the final competitive element that brings out relativities – mostly.

    Your comments on SSC were depressing – if unexpected. Cost cutting doesnt always have to lead to reduced flexibility and service!

    Mark Claydon-Smith

  4. Mark
    Thanks. It is good to know EPSRC are aware of many of these problems. As you say, refereeing is a particular problem for interdisciplinary work – a debate I have regularly had with Dave Delpy. I haven’t really seen steps being put forward to solve the problem and, in a field like mine at the physics-biology interface, it can cause real problems for the community. If the work always fails then EPSRC can say it is a) either a sub-standard discipline or b) there aren’t good applications coming in because people give up. I know an attempt was made to get round this by signposting, but with the ongoing refereeing problem that is unlikely really to resolve the issue.

    However my point here was slightly different; it isn’t so much that different sub-disciplines may disagree, but the whole style of refereeing may differ between communities. I don’t want to name specific fields here, but am happy to do that offline.

    I do hope EPSRC will continue to examine, quantitatively, the way refereeing is carried out along the lines you describe of your earlier study.

    • Mark Claydon-Smith says:

      I think your point about different style of reviewing is more about the nature of research quality. I always find it ironic that we can have endless debates about the categorisation of “impact”, when the elephant in the room is “excellence”. Quality is a subjective judgement about potential.

      The interface with the social sciences is a particularly good example. Social scientists have an obsession with methodology and a culture of critique which is quite pronounced – and alien to EPS. However, my view is that this is more a reflection of the nature of the “research” in social sciences, where repeatability is much more challenging, and results are more emergent and interpretative – certainly than in the natural sciences. The nature of research is then strongly relected in the attitudes of reviewers.

      Another interesting interface is BBSRC/EPSRC; the bioscience interface with chemical sciences is much more natural/effective that biosciences with engineering – and is (in my opinbion) down to community world-view rather than institutional factors.

      Trying to be more upbeat about interdisciplinary research – things do change of time as emerging fields generate their own communities which define themselves as sub-disciplines. Part of the role of the Research Councils is to nurture and foster emergence – inevitably this is a bit piece-meal.

      Final thought – interdicisplinary review is like interdisciplinary research. It is more about groups than about individuals.

      Mark

  5. Paul says:

    A very interesting read Athene. In fact this is an issue I casually raised with a member of the EPSRC team at one of their recent regional meetings. However she assured me that there were no issues with regards reviewing cultures across sub-disciplines that require any tensioning of these scores at panel. I was, of course, a little skeptical.

    Out of interest, was there (in your opinion) any correlation between generous (or more severe) reviewing practices in sub-disciplines where they have been ear-marked for reductions (or growth) in EPSRC funding through the Shaping Capabilities model?

    While peer review, in principal, is the ideal way to assess the work or ideas of other academics, in practice it is often flawed by the human nature of the reviewers and the various external political pressures that may shape their motivations. For example, I have heard stories of pacts between leading groups for reciprocal generosity in peer review of journal manuscripts (which is possibly quite important in an era where editors of high profile journals often do not even give a right to respond if 2 out of 3 reviews aren’t highly favourable!). Conflict of interest is also an interesting one; I often think that if a reviewer does not have some conflict of interest then they are perhaps not expert enough to give an incisive, insightful critique of the document!

    Much of this though comes down to the personality of the individual reviewer. Would it not be easy for the research councils to compile data on the reviews submitted by individual reviewers to provide additional data to aid the selection panels? For example knowing that a 3/6 from reviewer X is more like a 5/6 from an average reviewer and 6/6 from reviewer Y is more like a 4/6 when normalised across the panel’s remit.

    • Paul
      Physics as yet I don’t think has really had most of its portfolio up or down regulated with respect to funding, so what you feared wasn’t apparent – at leat to my eyes. I think what you describe about citations is not uncommon; FSP blogged about this very phenomenon recently. I suspect sometimes it is just accidental rather than premeditated, because one is likely to read papers of people you know and collaborate with more than those you’ve never come across before, but it can become self-reinforcing. I think refereeing can always serve as an opportunity to grind an axe, but there is some scope for reducing the weight given to an obviously unreasonable report if that is the case. As for EPSRC trying to normalise referees’ scores, I believe they once did look at whether there was a problem of sufficient magnitude to make it worth doing, and concluded on average there wasn’t. But of course that probably didn’t look at the disciplinary differences I was alluding to.

  6. BB says:

    I agree totally with the comments above that suggest that we have little or no evidence that we are choosing to fund the correct grants. In particular one might wonder if the BBSRC re-reviewing is adding or subtracting value from the process.

    We are scientists and we should test that our methodology performs better than random. I believe that a special funding round should be held where proposals are submitted and reviewed and ranked but are funded either randomly or using some pre-agreed algorithm to select proposals across the scoring system. The outputs of the research should then be measured.

    My personal suspicion is that there will be no statistically significant difference in the outcomes for proposals at the top and the bottom of the ranking system…..