I spent all of today attending the “In metrics we trust?” workshop organised jointly by HEFCE and the Science Policy Research Unit (SPRU) at Sussex University. This was part of the information-gathering process of HEFCE’s independent review of the role of metrics in research assessment; the review has a particular focus on how metrics might be used in the Research Excellence Framework (REF) that determines block grant allocations to university departments and research institutes. I was attending because I am a member of the review steering group.
The day promised to be one of vigorous debate because the consultation process that closed earlier in the summer had attracted over 150 responses — soon to be published — and these presented a wide range of views on the dangers and potential of metrics. And so it proved to be, with three panel sessions exploring “The changing landscape for research metrics”, “The darker side of metrics: gaming & unintended consequences” and the question of whether there can be reasonable progress “Towards responsible uses of metrics”. Sandwiched between these was a bazaar in which various metrics vendors displayed their wares.
I don’t have time tonight to capture the range of points and insights that were offered during the course of a interesting day but was somewhat reassured by the widely shared expressions of belief that any use of metrics in research assessment — or as a probe of the propagation or impact of research into the wider world — has to be done with care. The mantra that metrics should inform judgements and not replace them was repeated by many participants and will hopefully soon be enshrined in set of principles to be known as the Leiden Manifesto.
Almost a lone voice, Prof Dorothy Bishop presented a provocative case for supplanting the cumbersome system of peer review in the REF with a much lighter touch analysis of departmental h-indices calculated for research-active staff — an idea that she has previously outlined on her blog. Dorothy showed that, at least for some disciplines (including natural sciences and psychology), use of this metric generated scores that correlated well with resource allocations from the 2008 Research Assessment Exercise (the forerunner of the REF). (Update: as David Colquhoun points out below, Dorothy showed today that most of the correlation is actually due to the the number of people in each department — and she has since detailed her proposals in a new blogpost). The particular advantages of this approach are the cost saving — reckoned to be somewhere between £60m and £100m — and the elimination of the bias that arises from panel members’ affiliations. But it remains to be seen if the method is applicable across all disciplines; or if it fulfils some of the other purposes of the REF, which include examination of broader impacts and demonstrating the commitment of UK research to quality control through periodic self-examination (a feature that plays well at the level of government).
I hope others might chime in with their impressions and analyses of the day. Already there is a Storify aggregation of some of the tweets that tracked the different sessions. I include below my contribution, which was part of the session on the darker side of metrics. It has been lightly edited to clarify and sharpen some points but remains brief and incomplete. This debate is far from over.
“I come here today very much with an open mind on many aspects of metrics, though I fear that may largely be because I am still somewhat confused. So I am glad to have the opportunity to participate in today’s discussions. Already, I am beginning to see some interesting things.
On some topics my mind is made up. I remain sick of impact factors, for example, because of the way that they are so commonly mis-applied in the assessment of individuals or individual pieces of research. I don’t need to rehearse the arguments that I laid out in a blog post of the same name in 2012, except to say that impact factors are a powerful illustration of how a relatively innocent innovation in quantitation can be perverted and do real damage to the research community. I don’t think there is much dispute on that point (though I was surprised and disappointed to come across defenders of this metric in the discussion at the end of this session).
I am worried about people being seduced by the apparent objectivity of numbers. We saw something of that last week in the excitement whipped up by the announcement of the World University Rankings in the Times Higher Education (THE). In the preamble to its explanation of the methodology the THE describes the ranking process as a “sophisticated exercise”, that is “carefully calibrated” to provide “the most comprehensive and balanced comparisons”. It ranks universities on a composite score drawn from estimates of a range of indicators of teaching, research volume and influence, industrial income, and international outlook.
The Times Higher are good enough to be open about the methodology but when you read exactly how they assemble and weigh the various components, you read statements such as “we believe…”, “UGs tend to…”, “our experts suggested that…” or worse: “the proxy suggests that…”. And so you can see that, although it may be sophisticated, the measure is clearly also subjective. It is not sophisticated enough to assign error bars or confidence intervals to the scores given to universities and I think that’s unhealthy. It seems as if the rankers are laying claim to a level of precision that cannot be justified.
And that tendency for numerical ‘measures’ to wrap themselves in an pseudo-objective authority is a longstanding problem with metrics; in the end people adopt them without thinking hard enough about where they came from.
As a result, I am worried about the word ‘metric’. It implies measurement but, although there are now an increasing number of things that we can count — thanks to the increasing computerisation and connectedness due to the internet — there is still much uncertainty (as we heard this morning from Cameron Neylon) about what those numbers are measuring or what they mean. We still struggle to define quality and impact, never mind being able to measure them. But that is OK and we should not be shy about admitting the difficulty of making judgements about quality or impact — or conceding the limitations of the things that we are counting.
But I think it would be more honest if we were to abandon the word ‘metric’ and confine ourselves to the term ‘indicator’. To my mind it captures the nature of ‘metrics’ more accurately and limits the value that we tend to attribute to them (with apologies to all the bibliometricians and scientometricians in the room).
As someone who is from Ireland, where we have been telling stories for thousands of years — from a time before stories were written down, never mind cited and counted — I was pleased to have heard the word ‘story’ (or its posher cousin ‘narrative’) mentioned so many times in the session this morning. Stories matter to people and although it is now a commonplace to assert that ‘the plural of anecdote is not data’, I wonder if that is always true.
I think that in some ways the diversity of activities and qualities and impacts that are part and parcel of the academic enterprise can only be captured in stories and in narratives. We should be honest about our limited abilities to describe these attributes with quantitative indicators. More than that, we should not be shy about celebrating the wonderful stories that we can tell. I look forward to the publication of the REF2014 narratives (sorry, stories) because I think many of us will be pleasantly surprised to find out about the different ways that research work has vaulted over the walls of academia and into the real world — where it matters.
And finally, wearing the hats associated with my involvement in Science is Vital (SiV) and the Campaign for Science and Engineering (CaSE), I want to emphasise the important political dimension of the REF, which is that it provides a mechanism for the research community to demonstrate that it is accountable — to government and to the tax-payers who fund us. I think that is important. (And I think that is it important for the researchers on the ground buy into the process and participate — it is not sufficient to leave this to provosts, vice-chancellors and research managers).
With that in mind, and not forgetting the limitations of quantitative indicators, researchers shouldn’t be too prissy using numbers that have some meaning — especially if they are aggregated at levels that can attenuate the noise in the system. At SiV and CaSE, the case for continued investment in UK science is based in part on the productivity and quality of our research base. In part that is estimated through numbers of publications, and citation rates. The UK has 1% of world’s scientists but produces 6% of publications, and about 14% of the most highly cited papers. Do we really believe those numbers are meaningless? They are not the whole story of course. It is just as important — I am aware of the presence of sophisticated policy analysts such as Ben Martin and Andy Stirling in the room today — to be able to talk about the need maintain a research and university infrastructure so we have generative and absorptive capacity for innovation. (Not to mention the intrinsic value that research gives to human existence by satisfying our curious nature).
So although there are risks, I think we should count on some indicators to inform our judgements, to test and challenge our stories (so as to mitigate our biases), and to help us tell those stories to ministers and the public. Those risks are real but I think they can be counteracted by transparency and debate. I am optimistic that the research community is up to that challenge.”