Ranking organisations are seeking to diversify the measures use to evaluate universities. But without addressing the fundamental flaws in their methods, they will crush rather than embrace the rich complexity of our institutions of higher learning
When the Times Higher Education (THE) released their University Impact Rankings back in April, the idea of scoring institutions on the basis of their contributions to the UN sustainable development goals was hailed by THE’s ranker-in-chief, Phil Baty, as ground-breaking. The new rankings are certainly innovative, but for many academics it seemed like business as usual. Here was yet another way to slice and dice the data on university performance from a global industry that, although non-existent 20 years ago, has become a permanent feature of the higher education landscape. New rankings come so thick and fast these days – Wikipedia lists no fewer than 24 different league tables – that the temptation is to shrug and pass on to the next thing. But that would be a mistake.
On the face of it, THE are to be commended for diversifying the scrutiny of university performance beyond the debatable measures of reputation, research prowess or teaching excellence that are most commonly used to compile rankings. This is a sensible move. But fundamental problems remain which afflict all university league tables and are too often overlooked by student and universities – the major consumers of rankings. Students might be forgiven for not appreciating the uncertainties and incompleteness of the analyses that pull university rankings dangerously close to the realm of fake news, but universities should know better.
I suspect that most do know better but feel compelled nonetheless by market pressures to jostle for position in rankings that originated in the news media and are still granted widespread coverage. I don’t doubt the rankers’ belief in the valuable service they see themselves providing to the sector; and I’m sure they can justify to themselves the hard-nosed, real-world pragmatism brought to the task of scoring university performance, which may even be shared by some university leaders. But we can’t afford to side-step the fractured logic at the heart of league tables – the simplistic quantification and aggregation of incomparable qualities – which tears at the intellectual integrity that is surely still central to the idea of the university. We need to find better ways to set the standards for the multiple and various dimensions of university life so that, as they evolve in a constantly changing world, they can continue to enrich the societies and communities in which they are rooted.
And now perhaps, it is the turn of the rankers to shrug at the ramblings of yet another disgruntled academic. It’s all too easy to criticise and carp, they might say, but what about solutions? Students and governments rightly want to know about standards of performance, so who is going to hold universities to account?
Well here’s an idea: let’s try do it together. Cynics, please look away now. Let’s take at face value the higher aspirations for the health and impact of universities found in the mission statements of rankers and our institutions of higher learning, and proceed in the hope that they might help us jointly to unpick the tangle of political, commercial, and institutional interests that sustains the rankings industry in its present form.
Even if there is the will to do that – and I concede it remains an open question – the technical challenges are formidable. But before we can devise solutions, we need to understand the problem. We can start to get to grips with that by looking at how the THE put together its University Impact Rankings. To their credit, the THE provide a detailed description of how how they go about scoring universities. Let’s look at the big picture first. Here’s the summary of how the overall rankings are compiled:
“We use carefully calibrated indicators to provide comprehensive and balanced comparisons across three broad areas: research, outreach, and stewardship.
“There are 17 UN SDGs and we are evaluating university performance on 11 of them in our first edition of the ranking (click on a category below to view its specific methodology):
- SDG 3 – Good health and well-being
- SDG 4 – Quality education
- SDG 5 – Gender equality
- SDG 8 – Decent work and economic growth
- SDG 9 – Industry, innovation, and infrastructure
- SDG 10 – Reduced inequalities
- SDG 11 – Sustainable cities and communities
- SDG 12 – Responsible consumption and production
- SDG 13 – Climate action
- SDG 16 – Peace, justice and strong institutions
- SDG 17 – Partnerships for the goals
Universities can submit data on as many of these SDGs as they are able. Each SDG has a series of metrics that are used to evaluate the performance of the university in that SDG.
Any university that provides data on SDG 17 and at least three other SDGs is included in the overall ranking.
As well as the overall ranking, we also publish the results of each individual SDG in 11 separate tables. This enables us to reward any university that has participated with a ranking position, even if they are not eligible to be in the overall table.
A university’s final score in the overall table is calculated by combining its score in SDG 17 with its top three scores out of the remaining 10 SDGs. SDG 17 accounts for 22% of the overall score, while the other SDGs each carry a weighting of 26%. This means that different universities are scored based on a different set of SDGs, depending on their focus.”
Setting aside the question of what exactly is meant by “carefully calibrated”, the first major problem with this approach is the arbitrariness of the selection of SDGs. Only 11 of the 17 goals are included. The missing SDGs are:
- SDG 1 – No poverty
- SDG 2 – Zero hunger
- SDG 6 – Clean water and sanitation
- SDG 7 – Affordable and clean energy
- SDG 14 – Life below water
- SDG 15 – Life on land
The exclusion of these goals was presumably part of the careful deliberations that went in to the construction of the tables but they are hard to figure out. If your university is doing work on economics or social policy that might tackle poverty, that doesn’t count. The same applies if your institution is active in areas of food or agricultural technology, engineering solutions to the supply of clean water or clean energy, or if it researches environmental issues or biodiversity. Research and education in all of these domains have the potential for world-changing impact but this is effectively given zero weight by the THE. Perhaps there are technical reasons for the omission – a lack of sufficiently relevant indicators or data – but these are not given in the methodology. The aggregate estimation of impact by the THE is therefore incomplete.
The second major problem, which is more serious and more striking, is the non-comparability of the overall scores, since these are based on quantification of different activities at different universities. The impact score of a university is its SDG 17 score added to their highest scores for three other SDGs. This is a pragmatic rather than a scientific choice. And given that what we are talking about here are universities, supposedly society’s stoutest bastions of scholarly critique, it’s an astonishing one. The methodology is even more arbitrary than ‘standard’ league tables that at least score universities on a common set of categories (an approach that still fails to address the issue of assessing overall performance from non-comparable attributes – see below). It’s the equivalent of trying to figure out who is the best at sports by compiling a single league table that ranks footballers, tennis players and racing drivers. There is no intelligent way to do it. In the domain of sport, what’s the harm? The rankings would generate stories that people would have fun debating. But do we not take our universities more seriously? It is hard to escape the suspicion that news values are behind the desire to reduce university performance to a single number. Rankers can and should do better.
Who’s the best? (Original photos via Wikipedia – click image to see links)
Third, even within the scoring mechanism for each of the different SDGs there are issues of arbitrariness and incompleteness. For example, below in outline is how SDG 3 – good health and well-being – is scored. The ranking focuses on a number of disparate but related elements: “universities’ research on the key conditions and diseases that have a disproportionate impact on health outcomes across the world, their support for healthcare professions, and the health of students and staff.” The total score for SDG 3 is totted up from the following components, which track both inputs and outputs:
- Research on health and well-being (27%)
- Proportion of research papers that are viewed or downloaded (10%)
- Proportion of research papers that are cited in clinical guidance (10%)
- Number of publications (7%)
- Proportion of health graduates (34.6%)
- “proportion of graduates who receive a degree associated with a health-related profession out of the institution’s total number of graduates.”
- Collaborations and health services (38.4%)
- Collaborations with local or global health institutions to improve health and wellbeing outcomes (8.6%)
- Outreach programmes in the local community to improve health and wellbeing (8.6%)
- Free sexual and reproductive health services for students (8.6%)
- Free mental health support for students and staff (8.6%)
- Community access to university sports facilities (4%)
- Research on health and well-being (27%)
As with all such measures, the assigned weightings are arbitrary (and in this case, inexplicably precise). The selected components seem like reasonable targets but what exactly is being measured here? For example, how does one score a university’s provision of “free mental health support for students and staff”? Is there any measure of the standard of service – or the mental health improvement of the recipients? The THE asks for supporting evidence; this is “evaluated against a set of criteria and cross-validated where there is uncertainty.” But the criteria are not given, and although it is good to see acknowledgement of the uncertainties in the information being used to compile scores, there is no estimation or reporting of these uncertainties in the rankings – a long-standing problem that has been raised before but which no ranking organisation has properly addressed.
The scoring of SDG 12 – responsible consumption and production – takes a different approach. Here the focus is on “efficient use of resources and minimising waste”. The total score comprises the following components:
- Research on responsible consumption and production (27%)
- Operational measures (26.7%)
- Policies on ethical sourcing of goods (4.9%)
- Policies on the appropriate disposal of hazardous waste (4.9%)
- Policies on minimising waste sent to landfill/maximising recycling (4.9%)
- Policies on minimising the use of plastics (4.9%)
- Policies on minimising the use of disposable items (4.9%)
- Evidence that these policies also apply to outsourced services (1.1%)
- Evidence that these policies also apply to outsourced suppliers (1.1%)
- Proportion of recycled waste (27%)
- Proportion of waste that is recycled (13.5%)
- Proportion of waste that is not sent to landfill (13.5%)
- Publication of a sustainability report (19.3%)
Again, while each of the components seems a sensible choice, the score is made up of disparate elements assigned arbitrary weightings. Why should publishing a sustainability report earn almost as much credit as operational measures? There’s no good answer to that – it could be debated endlessly.
As a single estimate of activity in this one area, it may be reasonable to suggest that this method for evaluating progress towards SDG 12 is good enough to win a consensus of sorts among people keen to get on with the job of ensuring that the university uses resources well. But we still have to come back to the fundamental problem with the THE’s overall impact ranking that it depends on an arbitrarily weighted sum of the scores for disparate activities, each of which is the arbitrarily weighted sum of the scores for disparate activities. It is in this unreasonable aggregation that the system falls to illogical pieces.
And remember – these are just the impact rankings. While they might represent a well-intentioned attempt to diversify the benchmarking of university performance and to pull attention away from the narrower focus of standard rankings, I wouldn’t want to see them aggregated into the single score use in the THE’s Global university rankings (and am not aware of any plans to do so). Of course, this means that when the 2020 global rankings are published later this year, much of the attention recently devoted to trying to recognise impact will be lost. That raises the question of how serious the THE is about giving people a truly holistic view of how well the universities of the world are fulfilling their different missions.
There is a better way. That is to include all reasonable estimates of valued university activities – such as their academic and societal impact, and the quality of internal processes such as staff management, resource management, and the various dimensions of student experience, but without aggregating the data. Rankers need to embrace the full complexity and diversity of what universities do, while at the same time being more open about the uncertainties in the measurements and the incompleteness of their analyses.
This disaggregated approach has already been adopted by the Leiden Ranking generated by that university’s Centre for Science and Technology Studies (CWTS). As such, it is an embodiment of the CWTS principles for the responsible design, interpretation and use of university rankings. The Leiden Ranking may more narrowly focused in scope than the suite of rankings generated by the THE, but like them it has started to incorporate elements that go beyond purely academic impact, such as gender balance and institutional commitment to open access. Even with disaggregation, one has to think carefully about institutional and disciplinary contexts to make full use of these data, as Cassidy Sugimoto and Vincent Lariviere have shown in a recent analysis of the gender data on academic authorship. Such care and insight fades from view when university ‘performance’ is boiled down to a single number, and lost from sight altogether amid the headlines of this or that university moving up or down the resultant tables.
There are risks in expanding the number of measure of university performance (even if one adheres to the principle of disaggregation). The burden of measurement – which seems to have an inexorable tendency to grow – may soon begin to outweigh the benefits of evaluation. There is something dehumanising in seeking to put a number on every particle of human activity, however worthy the aim; and finding the appropriate balance between quantitative and qualitative modes of evaluation is task that demands constant vigilance and negotiation. But the risks should be mitigated by involving the measured in co-designing the processes of evaluation. The trick is to find effective ways for different stakeholders to do so in good faith.
I believe Phil Baty is sincere when he says “Universities make the world a better place in so many different ways. @timeshighered is delighted to champion that work…” That is a vision shared by many who lead and work in universities – and one that surely has an enduring appeal to students. But the credibility of that vision depends critically on our ability to develop a collaborative approach to delivering it. If you read the rest of Baty’s tweet it says “…with the new University Impact Rankings, out in April #THEglobalpact #SDGs”. The key question for Baty – and all university rankers – is how willing they are to take the risks of engagement with the sector, and the serious critique of their methodologies. Consensus may not be possible – academia is perhaps too fractious and ranking organisations are clearly constrained by their commercial interests – but think of the impact we might have if we could find a sustainable way to work towards a common goal.