Why I Can’t Write Anything Funny about the REF

It’s the silly season, a time of year when many people are on holiday and usually big news tends to be in short supply other than the the annual excitement over A level grades. (Mind you, not so this year, when there is plenty of horrendous news to occupy the mind coming from Egypt.) Of course much of the world believes academics only work for about half the year anyhow, but for many of us this season is a precious time to regroup, catch up and generally think more deeply in a way the hurried pace of the rest of the year prevents. Additionally, there is the pleasure that a reduction in the email deluge brings, because so many people are away and by and large committees are not meeting. Now that I am not tied to taking breaks during school holidays, I suffer from the fact that the times I am away are much less quiet on the email front (July and/or September), but that is a relatively small price to pay for the locations I visit being moderately peaceful generally with lower numbers of tourists and cheaper prices.

This year, however, whatever the media may care to believe about Oxbridge academics, I have my nose to the grindstone with REF preparations. Having had the summer of 2008 destroyed by sitting on an RAE panel, reading what felt like an interminable number of outputs and attending multiple meetings in a rain-drenched Lake District, I vowed not to participate in another such panel. And I have stuck to that, although I did take part in the REF Pilot exercise in Physics. This was  a pretty moderate affair, with only a handful of meetings and limited paperwork to plough through, but it provoked a pretty negative impression of the process. Nevertheless, the downside of not serving on the actual REF panel is that I have the dubious pleasure of being the Chair of my university’s Physics and Astronomy local committee. So, whilst other people gad about, I am fretting about the accuracy of entries for all our outputs and composing the prose to accompany the submission documents covering both ‘Environment’ and ‘Impact’.

One of the more challenging conundrums of the format of this round of submissions is that the statements covering both environment and impact, as well as the actual number of impact case studies required, depend on the number of FTE’s (full time equivalents) submitted. This number is, unfortunately, a moving target – at least in a department as large as mine. People are still being head-hunted to move elsewhere and may need to be removed, we have one or two big names awaiting their visa paperwork being completed, and we will have a new batch of research fellows arriving at the start of the new academic year. We probably know about most of these fellows, but there will always be one or two who turn up unexpectedly due to a failure in communication somewhere along the line. As things stand, we hope we are sitting comfortably mid-band, so that we won’t be surprised into finding we suddenly need to start a new impact case study from scratch. But there’s many a slip….

It is in part for these reason that, unlike the horror stories recently reported from Leicester  implying there will be ‘consequences’ for eligible staff not entered into the REF this time around, Cambridge’s Code of Practice says the complete opposite and makes it clear that this will not occur in our University.  It is appreciated that if the addition of one more individual tips any department over into requiring a completely new case study, such an addition may make no sense whatsoever. So, the code of practice allows people strategically not to be entered and, in the interests of equity, there is no way this omission could be used as a black mark against any individual. I had understood other universities were also operating in the same way, so it is depressing to hear about the Leicester story (of course, there is always the possibility the reporting is inaccurate or misleading).

Having had my brains focused on the REF for some time I have been wracking my brains for how to write a further light-hearted blogpost on the topic, as I did just before Christmas. To be honest, it is no joke. Nothing I can think of could be regarded as the least bit light-hearted or amusing, as well as true and relevant. This is deadly serious and the work of preparation isn’t amusing either. It is necessary to be meticulous in detail (not always my strong point, but there are others in the department who are far better at that than me) and conscious of the guidelines to make sure every requisite point is appropriately covered. Sitting on the RAE panel last time it was only too obvious that the odd department had slipped up, omitted to mention some important point – student training, awards to Early Career Researchers or availability of technical support, that sort of thing which could so easily get overlooked in the grand scheme of things – and so were docked a metaphorical mark or two. As with examinations, answering the question as posed is all important.

So, no light entertainment for me over the REF, no anecdotes to spill, which might either give away too much or be meaningless and certainly no peace over the vacation. Instead, this is simply heavyweight, worrying stuff. The time commitment of many members of staff is very substantial, the associated cost of this time (of those of us who prepare the documents, of those in the University who vet them and of those on the panels who have to pore over them) doesn’t bear thinking about. No longer a light touch exercise, as it was in its initial manifestation in the 1990’s. Like so much in our society it seems to get more burdensome each time (see this recent account of what drove a new maths teacher out of the profession, where there is ever more paperwork required to satisfy accountability box ticking). Is it still serving any useful purpose? It’s hard to be convinced of that. I suspect that all this time and effort merely means we will end up with answers that could have been predicted with reasonable accuracy anyhow.  Time will tell, but nevertheless I know how awful I will end up feeling if any sloppiness on my part ultimately turns out to have let the side down in my own neck of the woods.


This entry was posted in Research, Science Funding, Uncategorized and tagged , , . Bookmark the permalink.

62 Responses to Why I Can’t Write Anything Funny about the REF

  1. @robertmbowman says:

    Great post – not funny for people bring ‘marginalised’ in departments for non submission. Don’t think many uni senior management teams & research admin have read/understand rules/options for inclusion of staff across sections on UOA, and they are still wedded to inclusion as proxy for ‘performance’ of staff in general. Fundamental flaw in REF and predecessors is not to examine all research, the skewed funding formula prevents this which is just daft.
    For physics it is critical for discipline we don’t get scenario of 2008 where > 20 computer sci departments had profile GPA higher than best physics submission. Huge damage in universities who used results for internal beauty / resource allocation in following few years. Anyway after vacation I’m back to a final buff of TWO impact docs I’m ‘responsible’ for.

  2. REF Champion says:

    My university does not allow me to strategically not enter people. Then again, we’re well over the nearest impact case-study border, so the problem has not occurred. I’ve been working on the REF for more than a year but it’s getting worse and worse over this summer. If I get one more bloody review of my REF3 and REF5, I might lose the plot. Can’t even imagine the horror of actually being on the RAE or REF panel.

  3. Dave Fernig says:

    A most enjoyable post – perhaps universities should pay for a post REF conference of their REF Wallahs – this would be a reward in the form of a random, multidisciplinary exercise to help re-engage in thinking after wading through the mud of REF.

    Encouraging to see Cambridge being sensible about returnees vs non returnees.

  4. Pingback: REF: update | Ferniglab's Blog

  5. As a long-time critic of the REF (http://bit.ly/GBXgxB) I can only sympathise with Athene’s plight. It is thoroughly depressing to see high-calibre academic minds spending hours on preparing, revising and revising again REF submissions, rather than doing the original research they are good at. But I do feel that the academic community has brought this situation on themselves, and it is interesting to consider why a highly intelligent body of people has let this happen.
    There have been various demonstrations that, at least in the sciences, the REF is ludicrously inefficient. I carried out a back-of-the-envelope study of how the 2004 RAE funding outcomes in psychology correlated with the H-index of the department, and found them to be highly correlated (http://bit.ly/V9f23R). So basically you can get virtually the same rank ordering of departments using metrics that could be assembled in a week or so. And insofar as there were outliers, they were at least partly accounted for by benefits accruing to departments with a member on the REF panel, suggesting an objective index based on outputs might be less subject to bias. The same has been shown for physics (http://bit.ly/13pcxxk). See also this recent comparison on citation-based measures and peer review for other subjects; Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2013). Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence. Scientometrics. doi: 10.1007/s11192-013-1058-9.
    At one point it looked as if HEFCE was going to move to a metrics-based evaluation, at least for sciences, but as far as I can tell this was overwhelmingly rejected by the academics themselves. Metrics are denigrated as crude and imprecise, whereas the subjectivity of panels scrutinising shedloads of documents is seen as a plus: even if it involves everyone in enormous amounts of work, because it might just pick up on something that is missed by the metrics. We should instead be asking, do metrics give us the same answer as expert panels, and if not, what factors are panels using that metrics miss? And if we find that panels are somehow more accurate at capturing elusive aspects of research quality, is this important enough to justify the cost of numerous academics spending hours of their time attempting to quantify it?
    Even more basic than the need to examine cost-effectiveness of the REF is the whole logic of the exercise. We give research funding to the universities who are doing well, and deny it to those who are not. Of course, there are good reasons for this; we want to encourage success and ensure that productive institutions can cover costs of staff and infrastructure. The REF is admired in many countries where nepotism or politics determines who gets hired and fired: our system motivates institutions to hire those who are excellent, regardless of anything else. But when funds are squeezed, one consequence of the current system is that all the funds accumulate in a few elite institutions. Not everyone can be an Oxford or Cambridge: there is a need in the UK for solid middle-ranking Universities which can develop areas of research expertise. The rules for REF2014 make it clear that only 4* research will bring rewards. I worry that one consequence of the current exercise will be that many good departments will be starved of research funding, and at risk of closure.

    • Dorothy, you’re right of course. A lot of people got very bothered by the idea of only metrics being used. I suspect that may have been more on the humanities side than in the sciences, but I may be misremembering. The trouble is that metrics can be so horribly damaging, as we’ve seen recently with impact factors. Furthermore, citations vary from field to field so slavishly using some number, without an actual person tempering them, would be pretty horrendous. For instance, I believe synthetic chemistry method developments get far more citations than other parts of chemistry, so that effect could be hugely distorting too. I don’t think it would work without a lot of thought on someone’s part to make sure different (sub-)fields were treated appropriately. It would of course have saved all this hassle,but probably only by creating other problems. The real issue is the feeling in Whitehall that accountability and quantification of research is necessary – as soon as that is accepted, there will be problems of one sort or another.

      • No measure is trouble-free,and all can be gamed, but I think you are introducing a red herring by mentioning journal impact factors, which have nothing to do with the quality of individual publications. And please note I am, like the others cited in my response, talking about computing an H-index for the whole unit of assessment, not for individuals. Yes, this varies from discipline to discipline, but if you compare like with like, that’s not an issue. And the point is that whatever voodoo the panels are using to get a “fairer” judgement, if the end result correlates over .9 with the departmental H-index (see Mryglod et al), is it worth all that effort – given that it also runs the risk of subjective bias?

        • I realise Impact Factors aren’t immediately relevant to your model, but they may be in other plausible ones (plausible in the eyes of the powers-that-be that is). You have chosen one model, but I can see that many other metric-only ones could be suggested, including ones where IFs featured. Furthermore, I carefully chose a within-discipline example of where citations vary widely – there would be plenty of others. This means that an H index for a professor of synthetic chemistry is likely to be higher than for a physical chemist, say. Don’t you think that would just lead to other kinds of game-playing? Yes, it would save the grief I’m going through (currently my Saturday is being spent working on the Impact Template to which responding to you is a welcome diversion), but it would annoy other people in other ways. As I say, the real problem lies in the feeling that academics can’t be trusted to behave with tax-payers money, so we have to be endlessly quantified to check what we’re up to.

        • Another point. The H-index is notoriously unfair to young people, the very people we need for the future.

          • Mark Johnson says:

            I thought one of the issues with metrics was that many departments (and even many careers) are a mix of disciplines with different citation patterns. As mentioned above, this creates pressures to focus on those disciplines in the mix with high citations. We do know that some fields with low citation rates (e.g., Pure Maths) are worth keeping. You can normalize by field, but fields are not easy to define.

            Using ISI to define fields seems an issue as it basically hands the process over to the subjective decisions of a company. Normalization will always require some subjective judgement, making it highly political as to who chooses the normalization and with what agenda. End result; metrics have not avoided some of the issues that complicate the RAE/REF. I think the decision to keep peer review therefore reflects academics hopes that, while RAE/REF may have faults, this approach is slightly fairer than a metric-based system.

  6. Th says:

    If RAE (and probably REF) closely match the h-factor for departments, does that not suggest that the research environment and impact submissions are not being fully taken into account? It’s a plausible hypothesis that they are correlated, but it’s not a given. I agree that REF is a waste of time and money if it ends up following h-factors, but that might suggest that REF isn’t working the way it could, rather than that metrics are the way to go.

  7. I certainly don’t think that metrics would be a good substitute. They might start off being used to judge institutions, but they’d soon be used to judge individuals, with the well-known dire consequences. A much better solution would be to forget the whole thing. Less and less money comes from HEFCE and grant applications are peer-reviewed already. I know that process isn’t perfect, but doing it all again through the REF is duplication of effort,

    That’s one reason that I’m a bit concerned about the Snowball Metrics development http://www.snowballmetrics.com/. Although this is advertised as yet another way of ranking universities (as if there were not more than enough already), I suspect that HR people would not be able to resist the temptation to use it to fire people.

    I’m also not very happy to see that Snowball would tie us to Elsevier. That is not very helpful to the movement for open access.

  8. Brian Sloan says:

    The use of metrics as a replacement for the REF, such as a H-factor based on citations, would be disastrous in the humanities. Even though it’s not necessarily deliberate, I tend to give preference in citations to things that are easily accessible (though admittedly open access – problematic in itself – could have an effect on this) and are written by people of whom I’ve heard. Citation is not necessarily a mark of quality (and it fact it can be the opposite), and I often use footnotes to exclude issues that are *not* the subject of the relevant paper. People who are less well known, are working in obscure areas or are publishing in obscure journals would clearly be prejudiced by a metrics-based system, and the system could easily be manipulated through institutions encouraging staff to cite colleagues’ work even where it’s not academically necessary.

    The REF is an enormously bureaucratic, problematic and damaging process, but it does does at least attempt to measure quality per se. Perceived shortcuts like metrics would probably make matters worse.

  9. Bert Timmermans says:

    I agree that metrics as a substitute would be disastrous. I dislike metrics, despite the fact that they seem to breed like rabbits nowadays. I left ResearchGate the moment they introduced their own “impact” metric.

    Still, isn’t the problem the same as with grants, where you often think they should either just rate the proposal, or otherwise just give me the bucks based on publications so I can save time? I mean, writing a grant, it usually has more text on impact, outreach, application, how this will benefit the UK, the EU or whoever, than on actual research. But in the end your publication list weighs heavily. Or does it? I think the disadvantage of the way things are done now for grants and for the REF, is that it is vague and opaque, or at least it leaves room for vagueness. Funnily, I think this is precisely why people like it. People/groups who have high impacts can believe they’ll come out head of the pack because it counts; people/groups with less impact but other good work can believe they’ll come out head of the pack or at least not too bad. The crucial thing here is that it is essentially a system against which no entire group will protest nor wholeheartedly accept because each group fears that whichever way it goes, their chances will be slimmer. At least if it were impact, we would know against what to protest.

    The crucial problem though, with grants as with REF, is twofold: (1) the increasing measure of scientific quality via quantification of scientific output, (2) the increasing importance put on getting third-party funding. Both issues have been touched upon regularly in the past years, and I need not go into the rank evil of the publication insanity that has gripped us all for a couple of decades now (which would be ok if it weren’t toxically combined with the whole no-null-findings or no-replications issue). But there’s also the fact that not all research needs massive amounts of cash, and one may in fact wonder whether the time spent writing largely unsuccessful grants (as the acceptance rate is 1/5 or less) to hire people could not be spent more efficiently doing the research yourself (as most research, one way or another, gets published at one point). Nonetheless, Grant Writing per se has become a quantifier of academic output or success. Well, it isn’t. Unless you take success at grants to be indicative of impact as grants are based on it. You might say, “Those grants, however, serve to train people!”. Indeed, an increasing number of people do a PhD, which is becoming more and more the 3-year “higher diploma”. And only a fraction of those end up in academia. So what are we in fact doing? We are spending our research time writing more miss-than-hit applications to get money to train people who will find a job outside academia. REF-wise, we are spending time on a huge evaluation that does not bring us insights, but the outcome of which may veil the subsequent allocations in a shroud of deliberately fuzzy objectivity that everyone can live with, simply because it’s very difficult to protest against.

    In short, it is a vicious circle of Money having become such a massive issue and the fact that somewhere someone thought this money should be coupled to quantity of output, but not too explicitly. And we, scientists, all go with it. Because in the end, our vanity wants to prove our h isn’t as bad as that of the person in the next office? Because we are, in fact, bootstrapping ourselves into a job security that is best served by a combination of output quantification and a vague “rim of fat” that is the bible that we have to write around it each time?

    Don’t ask me how the money should be allocated, but every new system seems to get completely out of hand in the span of a decade. People globally seem to suffer from rank-itis. It’s as if we have collectively lost confidence in our capacity at detecting quality.

  10. Dave Fernig says:

    Apart from reducing neoptism, why have REF? RAE had a role to play in the past, which I blogged about, but several decades on, the push to a meritocracy and the dissolution of old boys networks is best achieved through other means, e.g., Athena Swan. So we could happily cancel REF, put the cash into an overhead on RCUK grants and studentships and get on with our work.

    • stephenemoss says:

      Scrapping the REF altogether would undoubtedly be the best way forward – surely the tax payer would rather see our talents applied to research and teaching than to this monstrous bureaucratic edifice that expands in complexity with each quinquennial reincarnation. And this is something that Athene touches on, that the first RAE (as it was way back then) was significantly simpler than the many-headed beast that confronts us now.

      I often wonder why change seems to automatically equate with increased complexity, be it the REF, grant application forms, health and safety, indeed any aspect of academic life. Changes to the way we do things are inevitable, and presumably are mooted and accepted because they are deemed to ‘make things better’. But what law is at work that determines that ‘making things better’ invariably means making things more complicated and more burdensome?

      Before the next REF (and even in my most optimistic moments I cannot see its demise) there should be some meta-analysis, as suggested by Dorothy, as to whether a simple metric analysis could be used that would yield the same result. If so, then we might dare to look forward (maybe for the first time), to ‘better’ meaning simpler.

  11. Colin Macilwain has a piece in Nature about Snowball metrics. He is not impressed.

  12. Helen Czerski says:

    From the point of view of a younger academic, the whole thing seems close to criminally wasteful. Like Athene says, it’s all come about because no-one felt able to trust academics to do their job. I cannot believe that

    no REF + a few uncaught coasting academics + a trusting environment

    is not better on every measure of productivity than

    REF + vast time-wastage + stressed and defensive academics.

    It would be amazing if anyone was able to do a controlled experiment on this and actually get some data…

  13. I just tweeted this. It should be obvious that this sort of thing will happen, but it seems not to occur to bean counters.

    Paper with 1100 citations can’t be replicated. So much for citations as measure of quality http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0072467 via @StuartRitchie

  14. Well, I can only say that those commenting are just illustrating my point: we have ourselves to blame for the system.
    My point is that research shows that in scientific subjects, a particular metric predicts the RAE outcomes with remarkable accuracy. The response from academics: an attack on metrics!
    If you think the last RAE results were unfair/inaccurate, then OK. But if you think it did a reasonable job in sorting out rankings, then I think it is an empirical issue whether the same result could be obtained more efficiently. It doesn’t matter whether you LIKE metrics, or can seem cases where they seem unfair: it’s a question of whether they give any worse a result than the system we currently have. We need a cost-benefit analysis of different approaches here, and that does seem remarkably hard for people to grasp. So we end up trying to devise some kind of ideal system, totally disregarding the costs. And I agree with Helen that if we looked at costs we might decide to live with a system that was imperfect but good enough.
    OF COURSE there are numerous arguments against metrics, but if you are rejecting them, you are also in effect rejecting the processes used for the RAE, which gave virtually the same result.
    What about other options?
    1) Like David I am old enough to remember the times before there was any research assessment exercise. Problem was, money was doled out in a way that was not at all transparent, and there was no incentive to try harder. Unless you were to have a system where all universities were given a block grant based purely on size (and I suspect we can all see the problems with that) you need some way of determining who gets what in a transparent and fair way.
    2) Even more radical would be to ditch central funding of research and allocate funding solely via the research councils – which seems to be what David is suggesting. I believe we’ve been moving in that direction, with a higher proportion of funding coming through the grants system, The limitation is that this does not make it easy for universities to plan ahead and develop large-scale infrastructure projects. But maybe a way could be found to overcome that.
    I despair at the way the UK vice-chancellors have totally failed to tackle this issue. Contrary to popular belief, the REF is not a system that has been imposed on us by faceless bureaucrats; HEFCE have, indeed, been assiduous in working with academics to try to develop a system that we would find acceptable. The system we have got is the one that was created in response to criticisms of previous incarnations. Any move to simplicity has been repeatedly rejected and considerations of cost-effectiveness are totally ignored. We really do have only ourselves to blame.

    • I may be misremembering, as it’s a while back, but I think the whole ‘impact’ aspect got introduced when some MP noticed that working with government got academics no credit in the system as was. So a completely new strand was introduced, where there are no particularly relevant metrics to establish a pecking order. Impact is a multi-facetted beast and, although it is frequently translated as economic impact, that isn’t the whole truth by any means. But it is of course easier to ‘measure’ economic impact than some other things. As long as impact as seen as an important factor your H index approach is unlikely to be accepted by Ministers et al as sufficient.

      Additionally discussing the environment has now been expanded to cover more aspects, again many of which are hard to quantify. Last time’s rubric meant that much could be gleaned from hard numbers (PhD students, income etc), now we are asked to address the less tangible as well. But many of these intangibles are important too – nurturing ECR’s, diversity etc. Would you throw all these out too when making decisions?

      Although it is the outputs – for which your H factor analysis is most relevant – which carry the weight of the funding aspects, it isn’t by all means all. I’m not sure what you feel about these aspects. I certainly would worry if the ONLY thing that mattered was an overall H factor, there would be a real push to support superstars and perverse publishing strategies, ignore ECRs and to hell with any decent working environment for students etc. That would be a distortion in a different but most unwelcome direction.

      Furthermore, as I learned to my costs through my work on the Physics RAE panel, although the analysis you suggest may give a similar rank order, what about comparability between disciplines? Physics, internally (and for all kinds of reasons it would be pointless to rehash) marked lower, overall, than chemistry. That had all kinds of ramifications for funding. I am not at all sure how your metrics route would resolve that. It didn’t work last time and I can’t see it would work using metrics alone since disciplines are so very different as I’ve said above. To my mind this is a major issue with ANY measurement. How can one say physics is better or worse than chemistry – or psychology or any other subject? And yet, ultimately, funding depends on that comparison. Right at the start of the RAE discussions last time this was an issue which never really got resolved. I daresay it has been in REF too. And I simply don’t see how it can be addressed. I assume reducing the number of superpanels is meant to help, but it is still going to be a major challenge.

      I am not giving you a solution I know. Merely rehashing the problems. I don’t see a solution because I am coming to believe any quantification is crude, inefficient and yet gives succour to those politically minded who want to say ‘we hold academics to account for their money’. Your metrics, the REF, the RAE they are indeed all flawed and so, perhaps we should be simply fighting hard against the unseen costs of using up academics’ time unproductively – this time cost must be horrendous.

  15. Kathy Rastle says:

    I’m in the throes of preparing my second Rae/REFsubmission for my department, so I sympathise with Athene’s original comments about waste and inefficiency. But, like Dorothy says, academics brought this on themselves. I seem to remember five models of metric-based solutions being proposed, all of which were rounded on, because amongst other things, they didn’t always show Oxbridge coming top!! There was also a consultation on the impact proposals, to which the responses from institutions were broadly favourable.

    Sometimes I think that I may be the only person in the country who actually likes the REF. We have to remember that this isn’t an exercise about “trusting academics”, but is a process to distribute taxpayer money fairly. In the very first RAE, my own department received the lowest possible score. Successive RAEs have allowed us to develop a strategy to improve, and we have invested the extra QR money that we have received year on year into realising that strategy, hiring great people, and building a better department. The last RAE placed us 7th in the country out of 100 or so departments. While the REF can put unwanted stress and anxiety on people, particularly as all of our careers have highs and lows, my experience is that becoming an achieving organisation through successive RAE/REFs has been beneficial to *everyone* in my department.

    I do hope that there will be reconsideration of metrics in future years, at least for the sciences, if some metric-based solution can be shown to do the same job as expert panels. Years ago, Andy Smith and Mike Eysenck showed that for psychology (which like Physics has large differences in citation rates across sub-disciplines), there was a very high correlation (on the order of .9 if memory serves) between citation rates for a department and RAE rating. Citations aren’t always going to capture quality, but does anyone really think that panel members with hundreds of papers to assess will be infallible? We have to ask ourselves whether they add that much value to the process as to justify the huge amount of work associated with the preparation and assessment of submissions.

  16. This is a slightly frustrated comment, because I can’t help but think that some of the arguments above are missing Dorothy’s point.

    1) The idea that we should abandon the REF and allow academics to (somehow) self-regulate is a pipe dream and a nonsense. It is absolutely never going to happen and wishing it won’t make it so. We are spending taxpayers money so why shouldn’t we have to account for that money? The ivory tower is no more.

    2) Dorothy absolutely nails this when she says that we have ourselves to blame for the REF in its current form. On the one hand we moan about how burdensome the system is, but when she shows that the same outcome can be predicted with remarkable precision based on a simple metric alone (let alone a more sophisticated combination of metrics), we attack the philosophy of metrics! Many of those commenting are scientists whose job it is to quantify complex natural systems. Why should our goal in judging short-term academic impact/quality be any different? Why can’t we put our heads together and develop an intelligent metrics-based mechanism? Why are we only seeing the downside of metrics?

    There almost seems to be a sense in which we think that the current REF measures quality by preserving “subjectivity” – like a ghost in the machine. Lets take star ratings of publications. Everyone I know in every university I know is using journal impact factors to select their four REF submissions. It makes no difference that impact factors are meaningless and that the REF officially disavows them. The only fact that matters is that JiF correlates highly with perceived journal rank. Since nobody making a REF submission expects anyone to actually read their papers, they do the rational thing and select based on journal rank (hence JiF).
    And we have evidence that goes beyond my anecdotes: http://www.theguardian.com/science/occams-corner/2012/nov/30/1

    This doesn’t just apply to selecting our four individual submissions. I have colleagues who were handed >100 publications to assess as part of various mock REFs. Some of these papers were more in their area of expertise than others. Did they read each one and offer a considered assessment based on their experience? Of course not, that would take months. They looked at the journal names (read ‘rank’), skimmed the abstracts, waved a thumb in the air and hazarded a best guess. The notion that we can, en masse, rely on subjectivity as providing some kind of divining rod to measure the quality of science is a delusion that is incompatible with the short timescale of REF and the enormous workload it involves.

    At the moment, we have the worst of both worlds: a REF that some of us believe measures “quality” by including vaguely defined subjectivity but which, in fact, weighs heavily on one really *bad* metric. I think we’re fooling ourselves if we believe this is anything other than an opaque and irrational version of what Dorothy is already suggesting. And it costs a bloody fortune.

    Given that the REF is going nowhere, my view is that we either seize the opportunity to simplify the REF by building it around intelligent metrics – thus saving huge amounts of time and money – or we stop complaining and celebrate its bloated, expensive, JiF-based perfection.

  17. Athene: well, thanks for engaging on this issue; I am delighted that we are having this debate. But you continue to make my case for me. You want to include “intangibles” and things that are “hard to quantify” in the assessment, but at the end of the day this exercise is about quantification. The ultimate result is going to be a single number that determines the funding for that unit of assessment. So how do we get from the intangibles to the numbers? Who decides? The whole issue of case studies is all about presentation, which is why so many universities are hiring PR firms to write them – and the rich universities have more money to throw at the problem than the poor ones. I remain cynical about the potential for this exercise to improve the fairness of funding allocations.
    We will have to wait and see whether predictions from departmental H-index are as good for REF2014, with all its intangibles, as it was for RAE. My guess is that the intangibles are not going to add much – and not enough to justify the costs of assessing them. If, at the end of REF2014, we were to to find that a simple metric accurately predicted the result (again), would you change your mind?
    I really don’t want to bang on about how wonderful a departmental H-index is, as of course it has flaws. But it’s a reasonable proxy for how research productive a department is, and it is historical, and tied to the department, so has the advantage that it would remove all incentives for the last-minute poaching of research stars that you expressed concerns about. It should lead to departments thinking more long-term, and hiring promising people early in the assessment period and then giving them every support to publish one or two stellar papers in that period.
    Comparability between disciplines is a separate issue that will arise under any system. Perhaps we need to remind ourselves of WHY university departments need core research funding: it is because research costs money. If neuroscience costs more money than cognitive psychology, for instance, it does not mean that neuroscience is better: it means that it is more expensive, and more core funding is needed to support things like brain scanners than laptop computers. We need to get out of the mindset that equates research income with research quality, and consider instead which subjects are worth supporting, and give them the funding needed to support the relevant infrastructure. Unfortunately, most vice-chancellors don’t see it that way, and instead regard research income as some kind of badge of success, and so deprecate cost-effective subjects.

  18. Chris
    1 Firstly, it is precisely because government is wedded to impact that we cannot simply rely on the H index metric Dorothy favours. I suspect they probably care less about environment than academics. So yes it is a pipe dream we will ever not be judged by some similar system, but I refuse to believe a metric based solely on outputs would, or should, be sufficient.

    2 Yes Jenny got some information on use of IF’s etc through her blog, but I can also tell you that what universities may do (and that isn’t how things are done locally from my perspective; Jenny was more likely to get the people who are suffering from crude IF use to respond, not the people who aren’t, so we don’t have real numbers to judge by) panels may not follow suit. On the basis of my RAE experience I promise you that things are not done so crudely by the readers. I did lose an entire summer to reading outputs. I did put a lot of energy into it and did more than the crude approach implied here. Honest! I have written about this before.

    Again, you may know places that are using PR agents to write Impact cases. I don’t. Certainly in the areas I am familiar with in Cambridge that is not happening, however much money you imply well off universities can throw at it. Maybe Oxford is doing so, you would know better than me. Secondly, I do have experience from the REF Pilot. It was the STEM subjects (Physics and Earth Sciences) that were most opposed to the way things were being done about impact. It seemed the Humanities became less worried whereas scientists became more so. Nevertheless, one of the good things about having people reading the studies is that they can see through spin. Leaving out stuff – which was one of the things my original post was fretting about – is entirely different from dressing mutton up as lamb, or whatever analogy you want to use. Furthermore, people on the panels do know the field and will mentally query claims they know to be false. At the very least it will raise questions – either for audit or for panel discussions.

    I feel very uneasy as to how the REF is being done and whether we will get ‘good’ numbers out. I do have experience of being on the dark side ie on an RAE panel (and as far as I know I am the only commenter who does) and keep trying to point out there are myths out there about how things were done. On the other hand, I do not like the idea of using a metric that relies solely on outputs and H indices. No other metric has yet been proposed here and Dorothy has not addressed the fact that distortion of departmental culture to the detriment of ECRs might result, as David and others have pointed out. That can’t be a good outcome for the health of the discipline either. I would rather lose my summer to this burden than feel that we were damaging the future opportunities for those who starting out.

  19. Bert Timmermans says:

    Aren’t we forgetting something? True, there has to be a system that somehow guarantees (in as far as that is possible) that taxpayer money is well-spent. A non-REF complex metric sounds excellent. But the major question should be whether the incentive structure that such a system puts in place is desirable:

    As it is, the entire discussion is about post-hoc evaluation, whereas the discussion should be about incentives: researchers are people and despite the fact that we’re in it for the fun of it, people will try and reverse-engineer the system/metric. An intelligent metric will combine a measure of output with a measure that, if maximised, enhances scientific practice.

    I mean, how can you expect scientific research to stray off the beaten track if such straying means a high risk in publishable outcome? True, one can argue that this perverse effect of publication-related measures is primarily due to the positive results bias, which no REF or metric can amend. Or can it? Precisely here lies the potential of creating a more complex metric: it is a great opportunity to create an incentive structure that incentivises good science, rather than mere “output” – which, if Dorothy Bishop is right, is what it does now.

  20. Athene and David. Just to clarify. An H-index computed over all time for an individual (which is how H index is usually used) will indeed be higher for someone who has been around a long time than a more junior person.Everyone knows that and it is usually taken into account in circumstances where H indices are used.
    An H index computed for papers published from an institution over a particular time period would not suffer from this problem.
    I am sure that forms of gaming would occur if this metric were used because the stakes are high and gaming always occurs when that is the case. But overall, I see this system as offering more protection for ECRs than the current one, which singles out people who are thought not to be publishing enough in high impact journals.
    And, to return to my earlier point, we don’t need a perfect system, we need one that is good enough. We are, after all, determining a single numerical scale from which funding is determined.
    I do think we have rather lost sight of what we are trying to achieve with REF2014. It is not a perfect description of research quality in every department of every UK institution: it is a funding formula that needs to be seen to be as fair and transparent as possible. If most people think University A is better than University B, but A gets 50% less funding, we probably have a bad system. If most people think both A and B are pretty good, but A gets 2% more funding, I would say we could live with that – especially if the cost of more precision was months if not years of highly-qualified people’s time.

  21. Dave Fernig says:

    Dorothy, I had seen your h-index calculation before and was most intrigued, and, if I recall, physics follows the same pattern. Chris is right in stating that measurement is here to stay; at least until such time as Parliament decides that we need to know less about the cost of things and more about their value, but that is some way off!
    As I have noted before, one problem is the geometric relationship between cash and grade. This pushes the game playing and other foolishness of institutions and tempts senior management into thinking they can “fix” problems by simply removing a proportion of non-returners. It is also the reason why universities never accepted a broad metric – each was doing calculations and trying to figure how they might grab more cash. Each of our institutions is also paying for a substantial number of administrators to through the whole REF process. These staff are not engaged in teaching and research.
    Though Chris states universities have used impact factor, this is not universal. In our UoA at Liverpool, at least 8 papers for each member of staff have been read by two senior academics (~ 1600 papers in all…) and we have plenty of evidence that impact factors are not a reliable guide!
    So we have made our bed and need to seriously consider how we might proceed. A departmental h-factor plus impact statements would be a lot less work and drain less financial resource. Given a less steep relationship between cash and metric, gaming may be less important.

  22. Dorothy
    I rarely disagree with anything you say, and I see your point. But there is a lot of money at stake. If the assessment were based on H-index, or any other metric, universities would put huge effort into maximising it, by any means, honest or otherwise. That means that individual academics would be put under pressure to maximise their H-index. That would do huge harm. Sensible though your argument sounds at first hearing, I think it’s inevitable that its long term consequences would be quite disastrous.

  23. But David, what do you think is happening at present? Academics are *already* under pressure to maximise citations. The H-index is widely used in hiring and firing decisions. Read anyone’s CV these days: most of them will mention it. (I see this as far less of a problem than the CVs that report the impact factor of the journals they publish in, which is another current trend, encouraged by some institutions).
    Here’s why I think if you are looking for a metric, the H-index may be the least bad. Maximising the H-index amounts to putting one’s efforts into work that will be highly cited. Yes, you can get cited for all kinds of wrong reasons, and all highly cited work is not good. But this would reduce the pressure to publish in glamour mags and it would encourage a focus on producing a smaller number of solid papers: lightweight, pot-boiler papers that don’t get cited are not worth spending time on because they don’t affect the H-index.
    And perhaps the key point, which I seem to have trouble getting across, is that when a group of academics on an RAE panel eventually disgorged their ratings after months of preparation and deliberation, they correlated very highly with the department’s H-index, provided this was a science subject. We are told that these ratings were based on reading the papers and judging them for quality. So we do have evidence that the H-index correlates well with subjective judgements of quality by experts.
    I accept that if HEFCE were to overtly state that the H-index would be used in future, people would think of devious ways of trying to maximise it. As soon as a measure becomes used as a means of allocating funds, people will try to game it. So how would they do that? I can think of three ways:
    a) Getting your name on a paper you’ve not contributed to would be one way, but if the department is the unit of measurement, this would only work if you got your name on a paper originating from another institution: multiple authors from the same department would make no difference. Given the competition between universities, you might find it hard to find someone from another institution to put you on their paper if you’d done no work on it, so you’d have to get yourself spurious authorship on papers originating overseas.
    b) Within a department, I can see a culture growing up whereby everyone in a department would be encouraged to cite one another’s papers. Journal editors and referees would need to be vigilant to spot where spurious citations occurred. But this would also be detectable. Just as currently it is possible to distinguish self-citations from citations by others, it would be relatively easy to compute an H-index for all citations, but also just for citations from outside the institution.
    c) Scientists may be tempted to engage in all sorts of dubious behaviours to achieve publications with a ‘wow’ factor. But it was ever thus, and I doubt that using an H-index metric would make any difference. The careful scientist who spends months or years getting something right is overlooked, whereas someone who rushes into print with a striking finding based on hasty and careless work may get more rewards, at least in the short term. To remedy that, we need changes in scientific practice elsewhere in the system, with more of an emphasis on replicability of results.
    Anyhow, we come back to the problem that we need some way of distributing funds that is fair, transparent and cost-effective. I despair that we will ever achieve this, because academics seem unwilling to accept any solution that has evident imperfections, and perfection is unattainable.

  24. Kate Jeffery says:

    I’ve really enjoyed this discussion. I think the arguments on both sides have been well made. Metric measures like departmental h-index are quick, cost effective and (from Dorothy’s fascinating analysis earlier) seem to arrive at the same result as reading everyone’s papers in detail. On the other hand they are crude, and miss the nuanced contributors to quality that can be unearthed by subjective analysis from experts in the field. The latter, however, is expensive in manpower and ill-defined (though I’ve suggested, half-seriously, a way around these problems – http://bit.ly/15HfgRA). I find myself a little bit stuck in the middle in terms of what I think we ought to do next – but agree the current system is unsustainable.

    If we’re going to come up with a different system I have just one plea – that we take into account value for money in an institution’s outputs. A university with a high h-index will probably have consumed a great deal more in terms of research funding, so is their index fairly reflective of this increased input? Does the institution add value? The current unto-those-that-hath-shall-be-given system inevitably results in steady sequestration of resources by fewer and fewer institutions, which may not necessarily be value for money from the perspective of the taxpayer. A similar argument applies to assessing individual scientists – right now we compare the output of a lab head overseeing five postdocs on the same level as one with only one – that’s not really fair. What we really want to know is how many citations (or whatever output measure we decide on) we get per taxpayer’s pound.

  25. Dorothy
    You say

    ” Academics are *already* under pressure to maximise citations. The H-index is widely used in hiring and firing decisions.”

    I wish I knew how true the second sentence is. When I was on hiring committees, we used to ask candidates to nominate their 2, 3, 4 (depending on age) best papers. Then we read them and asked the candidates questions about them, specially the Methods section. It was surprising how little some candidates knew about their ‘best papers’. It’s a good way to root out guest authorship.

    If the aim is to prevent the use of numerical metrics to assess individuals (that’s my aim anyway) then it can only harm that aim if you make the metrics the way in which universities are judged. You would simply increase the respectability of disreputable methods.

  26. REF Champion says:

    The discussion about impact factors misses the point: the REF panels are going to be provided with citation data from Scopus. They will have to “read” a few thousands papers, which is just quite simply impossible. So, it seems pretty obvious that citations (not impact factors) are going to play a big role in assessing the outputs. We had papers sent to externals assessors who came back with wildly different assessments demonstrating that metrics based assessment is the only reliable and scientific method to judge quality (however flawed).

    I seem to remember that a few years ago, HEFC looked into using citation data to asses departments and that it was the physicists who were dead set against it. As far as I could tell this was because in astronomy and particle physics, it is common to have one big shot prof with an army of minions. The minions would only be able to submit crappy technical papers and would therefore perform exceedingly poorly in a purely metrics based assessment.

  27. Is the H index widely used in hiring and firing decisions? I fear the answer is more than it should be – already! And if it were the basis of future funding it could only get worse. I’ve written previously about how dangerous using such a metric in a naive way can be. People certainly would game in all kinds of horrendous ways. In a way having a ‘simple’ system just encourages this sort of behaviour, because everyone can see how to maximise the gains. But there are other dangerous things about having an H index as the basis of good behaviour.

    Evidence suggests that gender comes into play. Women are cited less, according to this study and publish less, according to this one, although how widely these apply across all disciplines isn’t clear from these limited studies. H index is also coloured by which database you use, something colleagues in my department get very exercised about (apparently both Astronomy and Particle Physics don’t feature properly in standard databases for reasons I’m not sure about).

    Finally, if different sub-disciplines have different publishing strategies – as they undoubtedly do – then maybe we will find, for instance, all chemistry departments only hiring synthetic chemists rather than physical chemists, since H indices in the former tend to be higher than in the latter due to the nature of their publishing. Additionally I suspect they may bring in larger grants on average. That would be game playing with a vengeance and leading to huge damage on the subject.

    I know Dorothy has done analysis of a couple of disciplines and found the correlation between her metrics and actual RAE scores to be reasonable, but I would hate the idea of no tensioning, by panels peopled with a spread of expertise, to be carried out. And, all the problems of checking who was in your UoA on the census date, and that all their publications had exactly the correct form of address to show up in the analysis, would remain. It wouldn’t actually turn out to be that light touch at all if people were to have confidence in the numbers generated. So, much of the time I’m currently expending on checking things through would still apply.

  28. Paul says:

    The major problem I have with REF is that academic outputs submitted can have no relation to the department that submits them in that the work was all done elsewhere and that new colleagues are poached as “REF hires” for their past outputs. It would be better if there was a requirement that the department submitting a paper for REF should at least be listed in the institutional affiliations on the paper! It might stop some of the hiring shennanigans that go on and prevent short-term-ism hires based purely on the returnability of recent outputs and put the emphasis squarely on long-term potential (not always the same thing!).

    All attempted simplified metrics/rankings of highly complex and nuanced systems such as the performance of individual academics or academic departments are guaranteed to be flawed. So sometimes I wonder why do we bother? Mostly we are just playing the game by the rules placed in front of us but all these flawed metrics feel like such a dirty and unsatisfactory passtime for astute and intelligent academics to be participating in – particulalry when accuracy and definitives are so fundamental to our own fields of research!

  29. I just looked up my H-index for the first time ever.

    First problem. Web of Science says it is 54 but Google Scholar gives it as 63 (or 29 from 2008, which is 4 years after I “retired”).

    The difference could arise in part because Google Scholar says the number of citations to my book chapter on single channel fitting methods as 1063 (my highest) whereas Web of Science finds no citations of the same article, and none to my book (Lectures on Biostatistics) which has 899 citations on Scholar.

    None of these is real experimental research. The highest cited original papers are 878 and 715 for two not really great, and very short papers and 680 for a really trivial vacation project. They just happened to come along at the right time. About the same (698) for a 1998 review that seemed fairly trivial at the time (it explains things that I thought everyone would know, but apparently not). One of my best ever experimental papers (57 pages and my only one to be nominated as a classic in the field) comes in with 690, quite respectable, but behind the methods chapter, the book, the review and two very brief and, in retrospect, fairly trivial, papers.

    Among theoretical papers, a 1981 paper that worked out the theory in a rather clumsy way has more citations (563) than a far more substantial paper that treated similar topics much more elegantly in 1982 (450). Two of the most crucial papers for all our subsequent work, published in 1990 and 1992 had only 87 and 88 citations, but they had some seriously hard maths (the exact solution of the missed events problem).

    How can anyone think of trusting their fate to statistics like that? The two major sources can’t even agree on the numbers. Among the top handful, the correlation between number of citations and quality and originality is execrable (though of course there is a long tail of rarely cited things, many of which deserve it).

    The lesson is that, if you want to get on in the brave new world of metrics, don’t bother with original experimental work -spend your time writing reviews and methods chapters. And certainly don’t write anything too mathematical.

    • Mike Clark says:

      David is right when he points out that different databases give you very different H-index scores and that is another problem when it comes to blindly applying metrics. The problem is that the REF has many different categories of relevant outputs that are simply not collected and analysed by some of the traditional scholarly databases from the likes of Elsevier. In my experience Google Scholar certainly seems to give the highest numbers of citations because it seems to mine data more widely from across the web. Regarding my own area of translational research into therapeutic applications, another advantage is that Google Scholar collects citations from patents, which most of the others fail to do. When filing a patent, particularly a US patent, failure to cite the relevant prior art can be grounds for invalidating claims, thus there is an incentive not to overlook any relevant publications. In contrast to Google Scholar however I find that Microsofts attempt to compile academic publications is appalling, certainly with regard to my own cv. Microsoft seem to be incapable of reliably compiling disaggregated data for individuals with similar names and initials.

    • REF Champion says:

      The REF does not allow reviews, so what’s the problem?

      • Bert Timmermans says:

        The point was not about what the REF is now, the point was about whether it should be replaced by a metric, based on citations. But I suppose one could use a metric with self-citations removed (own institution cites is absurd because pre-REF people switch institutions often), and with reviews removed (though that’s not always so clear). Doesn’t solve the problem of First Big Papers being cited more, and the disincentive to double check any existing effect.

    • David:
      It is widely recognised that different databases give different H-indices, but that’s really not relevant to the argument. If you are using an H-index to compare institutions you just stick to the same one for everyone. In fact, citation counts will be provided to the REF2014 panels to use as they wish: these will all come from the same source.
      But to the main point: When you argue so valiantly against various manifestations of alternative medicine, you have to deal with lots of people who say “well, it worked for me” or whatever. Hearing your arguments feels a bit like that – you are picking out specific instances and saying that, because of these, the metric is flawed,and like homeopaths ignoring clinical trials, you are ignoring the existing evidence that shows that, in aggregate, the metric is a good predictor of rankings produced by the kind of expert judgement that you like.
      There are really 2 things at issue here:
      1. how well two systems of ranking agree – one that could be done by a few people in a couple of weeks, and another that takes many people a couple of years. This is an empirical question on which we already have quite a bit of data.
      The parallel with clinical trials can be continued here: yes, among clinical trials there will be people who get worse on the medication, who have side-effects, etc etc. And yes, we’d like to understand why that is so. But if the medication is treating a serious condition, and the decision is whether or not to introduce it now, we would use the overall treatment effect to guide our decision. What I am finding frustrating here, is that you and others are just ignoring the strong predictive power from metrics to RAE scores. If you want to dispute that evidence, then fine. But to discount it seems a bit perverse.
      2. The big issue is that of unintended consequences. Most of these arise because adoption of any measurement process will lead to strategic behaviour by institutions, and the consequences can be damaging to good science. There’s plenty of evidence that the last RAE and current REF2014 have altered institutional hiring and firing policies, introducing an incentive structure that many regard as detrimental. I think we need to look carefully at likely impact of any evaluation process on incentives and behaviour. But, here too, it is not sufficient just to look at a proposed new system and note that bad things may happen: what is needed is to compare the new system with the current system, to see if it is better or worse.

  30. Bert Timmermans says:

    I think David has a crucial point: citations go to reviews, meta-analyses etc. (Why does Behav Brain Sci score the highest SSCI IF, or does Nat Rev Neurosci have a higher IF than Nat Neurosci?). And to the first paper on the issue, the Big News. Don’t misunderstand me, the meta-analyses and reviews are necessary, but so are the replications, the non-replications. But why should I bother to replicate anything when, if it gets published at all, it gets me less citations than the original, certainly once my study gets incorporated in a meta-analysis? There seems to be a confusion between advances in scientific ideas and Important Papers. The latter contribute to the former, but the former aren’t carried by the latter alone. Citation hunting, while better than # pubs or JiF, leads to non-replication Big News papers or reviews. Where is the incentive to sink your teeth in an effect? True, the skewed citations towards the Big News papers is not the fault of the metric, and neither is the replication issue. But using a citation metric is both being blind to the current problematic issues in science and strengthening it by providing the wrong incentives. Striving to get cited is good, but sometimes you have to be able to go for something, the citation value of which is not an a priori given.

    • If, as Dorothy asks us to notice (with increasing frustration?), RAE/REF panels and consolidated h-factors come to the same rankings, it is likely that the incentives are quite similar… and indeed the problems you describe are very real in the current system. To address your point, we would need a third evaluation system which would be lighter while providing different incentives? What could it be?

  31. Paul Cairney says:

    I went with a group to meet some HE people in Sweden last year. They were gearing up to spend much more on ‘basic research’ and to distribute the money based on metrics alone (citations and previous income). It may be worth asking Swedish colleagues about the effect and how it went down.

  32. Dorothy
    Can I check if you are suggesting a review based on nothing but this sort of H-index-based metric plus an analysis of grant income? You would not want to see a panel involved at all? No discussion of training, equality issues or anything to do with relevance of research through impact? I reiterate that actually making sure entries – for outputs alone – are accurate would be fairly time-consuming for departments (or require some sort of draconian measures about checking addresses, initials etc at paper submission time) although it would certainly do away with the need to read the material – within the department or by the panel.

    • Athene
      I am suggesting we take an empirical approach and use anything that is simpler than the current system and predicts the same outcome.
      So far it looks like some kind of index based on departmental-level citations would work well – wouldn’t even need the grant income bit (which can be problematic too, as it can lead to bias towards expensive rather than cost-effective research).
      Anyone who wanted to add further criteria would need to demonstrate that they actually made a difference to the final outcome.

  33. One more point. A correlation coefficient of r = 0.9 sounds pretty good, but for the purposes of prediction it is far from perfect. For example it might be better to think of the factor by which the existence of correlation reduces the factor by which a the prediction interval for Y is reduced by knowing the corresponding value of X. This is 1 – sqrt(1 – r^2). Thus r = 0.9 only halves the prediction interval.

    • John King says:

      But who’s to say Y was correct in the first place? I seem to remember a lot of grumbling after the last RAE. In fact there’s a third variable, the “worth of research outputs” that is being assessed and we don’t know which is more closely related to that, some H-metric or the RAE decision data.

  34. Paul says:

    Hey, if we simply went on citations, we could update the league tables every Friday. Now wouldn’t that be brilliant?!

  35. John King says:

    It’s worth noting that all of the possible solutions to the allocation of funding based on some evaluation of research output are going to be wrong. Sources of error include all the usual ones in social science, then there’s assessment of the data by committee, and ultimately the commonly-observed problem with judging impact (who can predict what will have impact down the decades?).

    So the answer seems to be more about which is the *least wrong* answer, and I agree with Dorothy that when choosing a methodology it’s necessary to include in the judgement the massive amount of work that so many highly-trained people are having to put into the REF. If Dorothy’s proposed dumb-metric approach is slightly more “wrong” than a lovingly hand-crafted REF, BUT at the same time we get the equivalent of, say, 200 full-time senior scientists* working for a year, then perhaps that’s the sensible way to go.

    *200 is a massive guess, it would be interesting to figure out what it really is, and how much it costs.

  36. Dorothy
    “arguing like a homeopath”
    I can assure you that I still believe in Avogadro’s number.
    Have you read Seglen (1997)? http://www.dcscience.net/seglen97.pdf

    The problem with the bibliometric literature (like the IQ literature) is that it deals with a large number of intercorrelated variables. It seems to me far more informative to look at people who command universal respect and look at their publication record -I did that for Bert Sakmann at http://www.dcscience.net/?p=182 If the bibliometricians were allowed to rule, he’d probably have been fired (and I certainly would have been). If it wasn’t so boring, I’d do this for a lot more people. But for me there is already quite enough evidence of the daftness of metrics so the main priority is to prevent bibliometricians from corrupting science any more than they already have.

    There is also the statistical point I made above.

    The unintended consequences which you mention yourself, seem to me to outweigh other considerations.

    • I have indeed read Seglen, who makes a good case against the journal impact factor. Which, as I have noted previously, has absolutely nothing to do with the H-index.
      And I just checked Bert Sackmann’s H-index on Web of Science. It is a staggering 112. I rest my case.

  37. Matthew says:

    Dorothy: I am puzzled by your h index suggestion. In fields with long publication lags I can’t really see how it would work. You’d have an enormous disincentive to publish anything near the end of a REF period, and an enormous incentive to hold off until Year 1 of the next period.

    I also think the statistical properties of the h index are really weird. You can’t adequately capture the shape of an individual’s (or department’s) citation distribution with a single number. Indeed, the International Mathematical Union wrote:
    “[h indices] are often breathtakingly naïve attempts to capture a complex citation record with a single number. Indeed, the primary advantage of these new indices over simple histograms of citation counts is that the indices discard almost all the detail of citation records, and this makes it possible to rank any two scientists. Even simple examples, however, show that the discarded information is needed to understand a research record.”

    The fundamental problem with this debate is that it’s just not possible to accurately assess the quality of someone’s research until many years after it was published. Tons of work in the philosophy of science (e.g. Donald Gillies’s recent book) teaches us that, along with many historical examples. (How would Cantor have faired in the REF? All his peers thought he was mad, but now his work is the basis of all modern mathematics. Wittgenstein only wrote one book during his lifetime, he’d have been sacked these days). If something cannot in principle be done, then the mechanism people choose for their futile attempts at doing it is kind of irrelevant.

  38. As Dorothy has already mentioned, I did a similar analysis to Dorothy’s for Physics. You can read about it here here. However, my view of this is slightly more nuanced than Dorothy’s, I think. I’ve been very anti-REF for many of the same reasons as David Colquhoun. I find it remarkable that we invest so much time and effort into a process the results of which we can essentially replicate in a matter of hours. That, to me, is another reason why the process is flawed. Yes, we could replace it all with a simple h-index calculation, but as others have mentioned, that has its own problems (universities will try to game the system whatever approach we use). So, even though we could replace it (as far as I can tell) with some kind of metric based approach, I personally wouldn’t be arguing for that. I’m much more in favour of what Helen Czerski has suggested, but I suspect that we no longer live in a world where such an approach would be appreciated by policy makers or by university administrators.

  39. Dear All,
    Thanks to Athene for raising this topic; it has been a very illuminating set of comments and made me aware of many perspectives other than my own.
    I’m going to have to call a halt to my own comments here, as the day-job beckons, but just a couple of responses to specific issues raised.
    1. Matthew: yes, the H-index is only of use retrospectively, and if adopted, you would be determining university funding on the basis of track record rather than current status – and you’d need to have long enough intervals between assessments for H-index to be meaningful. But, as I’ve stressed repeatedly, I am not wedded to the H-index; I am just noting that this is at least one simple metric that appears to capture most of what panel assessments are measuring in science subjects.
    2. Athene: I realise from your comments that you see the purpose of the REF as rather different from me. Historically the RAE was introduced at the time when a lot of polytechnics became universities: in the early days, a block grant was just handed out by government to the (much smaller number of) universities, with little transparency as to how it was decided who got what. Once the number of universities expanded, people were horrified at the idea that all the available money might be divided up between all of them, and so some way of identifying research productivity was needed as a basis for deciding who got what. My view of the REF is that it is an exceedingly complicated way of making this determination, which in an attempt to be fair and thorough has got completely out of control. From your questions I deduce you see the REF as having a much broader role, in terms of incentivising particular kinds of behaviour, such as equal opportunities, fostering the young etc. So there’s a degree of mission creep from the original aims of the process. Not necessarily a bad thing, but might explain some of our differences.
    Another source of difference may be the types of science we do: psychologists are well acquainted with measurement difficulties: it is part of our bread-and-butter. The kinds of rating of ‘research impact’ or ‘research quality’ that are used in REF are on an ordinal scale at best, and there is plenty of scope for subjective disagreement and bias. What we currently have is a system that takes a load of ordinal measurements of unknown (but probably low) reliability, measuring different things (research impact, environment, etc) and then combines them into a composite, and uses this as the basis for allocating research funding. It’s rather remarkable that the end result correlates with a more objective measure of citations, but it does suggest that insofar as there is a general index of research quality that is identified by panel members, it is well indexed by citations.
    I am convinced by this discussion that my plea for a move to greater simplicity will have no force at all, but fortunately I’ll have retired by the time we get to REF2020.

  40. Pingback: Measuring the intangible: lessons for science from the DRS? › Mola Mola

  41. Could someone please define ‘research quality’?

    There are highly-cited publications in Nature or Science that are of poor quality, but would REF reviewers really judge them as being poor?

    Are panel assessments similarly ‘right’ or similarly ‘wrong’ if they strongly correlate with the h-index?

    What is it that we want at the end? Are we playing games for distributing money between institutions, or do we have the ambition of advancing science (and humanity) at the end?

    And by the way, why don’t we use metrics for awarding the Nobel prize?

  42. I do indeed, as Dorothy says, believe the last RAE and the current REF examine other things beyond outputs, for which H indices may be somewhat relevant. I think it is good they do, because we are failing the scientists of tomorrow if environments aren’t supportive and appropriate for instance. One easy way of game-playing, that was present to a small degree last time, was to hire in people for a week/month whatever from abroad over the census period. They contributed nothing to the environment in which people worked but they could be brilliant in pushing up a departmental H index rating. If the citations were the only thing that mattered I think some very unpleasant consequences would ensue which would be detrimental even if it saved a lot of time.

  43. Ach, I had meant to stop responding here, but I can’t let Athene’s latest comment go as it perpetuates a misunderstanding of what I have been trying to say.
    Athene: the departmental H-index is computing by finding articles based on the ADDRESS they come from. So parachuting someone in would not help you at all, unless they had publications from that address. This is why I stressed it is a retrospective measure.

  44. Bert Timmermans says:

    One more comment. Currently in Belgium (I don’t think it useful to post the links since it’s all in Dutch) there is a huge discussion about publication pressure and how universities are financed. It’s the usual protest, because Belgian academics has a bit the same issues as have been ventilated here: too much on quantity, too much pressure. Today I read one opinion that was actually much more nuanced (ok, for those who speak Dutch: http://www.standaard.be/cnt/dmf20130822_00703975).

    The title is “Aren’t there too many PhD’s?”. The guy’s position is that we shouldn’t abandon an objective measure, and that there simply has to be some justification w respect to the money we get; his arguments can be summarised as follows:
    – Publication culture seems to be different across disciplines, and most issues seem to be situated in humanities
    – The discussion is often just an exchange of slogans, and in reality, many recruitment panels hardly just look at quantity. In fact, he says, when as a jury member for ERC you focus your evaluation on H or citations/publications/impact, you make a fool of yourself because it’s a telltale sign of someone who hasn’t read the proposal thoroughly. So perhaps we’re fighting a strawman.
    – Finally, and this may be his core point, he writes that the more fundamental problem might be the extremely competitive climate for young researchers that has arisen over the past decades. And that perhaps (given that the number of tenured positions has not really grown), we should be thinking of offering less PhDs, and stop stressing out young people to fight in a cage for That Position.

    Indeed, as another academic wrote, for every (even temporary) position, the number of not just good but excellent candidates that present themselves is bewildering, certainly if your lab has some name. Back In The Days he says, you could be happy if you got a PhD student or a postdoc.

    And, indeed, is it not so that when a commodity is scarce, that the price rises? Well, here it’s not so much the one who offers the post who raises the price, but it’s the simple fact that there are people out there with a simply mind-warping publication record (in quantity and quality) at a very young age, and they will drown out the others. Others, who are not necessarily bad researchers. So the reality is that we may simply not be doing a generation of brilliant people not much good by getting them all into the pyramid game.

  45. Athene’s raised the issue of wanting to include information about research environment, etc in an assessment. However, while it’s in the submission for the RAE/REF, but it apparently isn’t being adequately included in the assessment of that submission: RAE outcomes correlate highly with departmental H index and that index has no information in it about environment, etc.

    That doesn’t mean the H index is a good assessment. It mostly points to a problem with the RAE/REF assessment. So; is something like the REF worth keeping if we can improve the assessment aspect to include the information H indices explicitly do not account for?

  46. Isabella B says:

    All of Canada’s universities are publically funded, yet there’s no REF exercise. How do they divide the funds?

Comments are closed.