What Does Excellence Look Like?

Harnessing the Metric Tide, the recently-published follow-on to the 2015 Report The Metric Tide, provides a welcome focus on our cultures and practice within HEIs. It imagines an ecosystem where metrics are collected which inform the community about the health of their working world, where inclusion is the norm. It fleshes out some of the ways the aspirations of the R+D People and Culture white paper produced during Amanda Solloway’s tenancy as science minister could become a reality. That was a document full of good intentions, but sadly lacking in any levers to transform the hopes into practice.

Finding ‘good’ metrics seems a sensible way to go. Except…..the problem is that metrics are so very hard to get right. For instance, how many PhD students do you have in your department? It sounds like a deceptively easy question, but how do you define a PhD student? Is it those who are fee-paying – that should be easy – or do you include those who have passed that point but haven’t submitted? Maybe that would be a better measure of whether PhDs are being supported to complete their PhDs in a timely manner, but might cause havoc in the numbers if part-timers are to be included. Anyhow, should it be submission that is the cut-off point, or the viva or the University approving the degree, or its conferment? It is important to be precise, but also to work out what is the question that needs to be answered. Sheer volume of students or some measure of how well they are supported? A lot of students over-running may be an indicator of something going wrong. I use this example simply to illustrate that data is not that easy to get in a precise way.

There are statistics out there, for instance from HESA, but these will be gathered for their own purposes, which may not precisely mesh with what is needed for a research exercise. In this case a specific example highlighting the problems arises from changes to what data is collected in 2019/20 meaning that data on non-academic staff, such as technicians, is no longer collected. Consequently, there will be no easy way of accessing these numbers from data collected mandatorily by HESA. As has always been said regarding Athena Swan applications (something the Review panel was very mindful of when it reported in 2020) there are very substantial difficulties in collecting the necessary data, and this will apply to all sorts of metrics.

One of the troubles, visible over successive cycles of the assessment exercise (whatever it has been called at the time), is the ability of institutions to game the system. In 2008, when I was on the Physics sub-panel, there was one institution that managed to hire a surprising number of eminent scientists from around the world and industry, who worked only for about the crucial 48 hours around the census date. Technically within the rules, but definitely not within the spirit of how things were to be done. I forget how the panel dealt with this flagrant breach of intent, but deal with it we did. Others fail to understand the rules such as, in the same exercise, the institution (also nameless, although I remember its name well) that didn’t flag any of its early career researchers and only listed prizes won (measures of esteem as they were called back then) of a handful of their most eminent researchers. This became a substantial handicap for them.

In the latest exercise, in which I chaired the Interdisciplinary Advisory Panel (IDAP), we would have wanted there to be some way claims about interdisciplinary working could be backed up by evidence, perhaps linking the environment statement to outputs to demonstrate warm words were being translated into practice. Was the environment really conducive to collaborations across disciplines? This wasn’t possible this time around, maybe some of the recommendations of this new report will foster better measures of success in this space, although there is little attention paid to this specific issue. The use of the flag, IDAP constructed hoping it would facilitate the work of the panels, was so arbitrary it was completely useless. How do you construct something more meaningful that everyone understands and uses in the same way? This matters. The ability to transcend disciplinary silos is a key part of moving fields forward, and should not be judged solely from impact case studies, where indeed much evidence of success crossing boundaries can be found. Good team working (even without any sense of interdisciplinarity) may exist within a single research group, across groups in a single department, or be much more complex. What measures can be established to judge whether team working is indeed working, or whether there is still a strong hierarchy where only some of the participants are valued?

Excellence is, as many have said, not a useful word because it has a nice warm feel about it but is ill-defined and essentially non-quantifiable. The idea proposed in Harnessing the Metric Tide of establishing a ‘Research Qualities Framework (RQF)’ in place of a ‘Research Excellence Framework’ to be more inclusive has many attractions, but the devil will be in the detail. Requiring the gender pay gap to be reported is one way of establishing aspects of the environment in terms of inclusivity, although grade segregation needs to be separated from within-grade discrepancies in pay. However, as more and more work is done about how women are, and more importantly are not, included in networks of collaboration, and how most simple metrics show disadvantage, I think a lot of thought needs to go into how metrics regarding inclusion are chosen. For instance, given the evidence regarding women’s tendency not to use self-citations, and how men tend not to cite women’s work as often as men’s, I do not feel comfortable with the idea, already implemented in REF2021, that quantitative data in the form of article citation counts are provided to sub-panels that request them. Will this not be likely to provide a false measure of success or ‘excellence’? There are many reasons feeding into these discrepancies, both cultural and sociological, but I am concerned that there may be an accrual of disadvantage with some of these metrics that need to be carefully scrutinised before implementation.

I applaud the report as a thorough review of how different groupings and countries are attempting to tackle transparency and openness in ‘measuring’ many different aspects of our research ecosystem. Finding ‘good’ statistics has to be a global aim, as does the goal of removing metrics that push perverse incentives (university rankings being one key bad actor in this landscape). However, there is a long way to go to establish interoperable mechanisms and data collection that will serve a maximum number of purposes with minimum workload imposition.

This entry was posted in Equality, Research and tagged , , , , . Bookmark the permalink.