My ‘Sick of Impact Factors‘ blog post seems to have struck much more of a chord than I anticipated. At the time of writing it has attracted over 12,900 page views and 460 tweets, far higher than my usual tallies.
The post also generated over 130 comments, which is a daunting number for anyone who now stumbles across the post; but since the comment thread contains several useful ideas and insights from scientists and journal editors, I wanted to highlight them in a summary post.
Useful links
Several people flagged up papers and reports that I had not previously spotted. It is just this sort of productive interaction that makes the blog form so wonderful.
Ross Mounce pointed out that he had previously discussed the non-utility of Journal Impact Factors (JIFs) and and linked to a useful opinion piece by Per Seglen.
Tom Webb mentioned a paper by Brian D. Cameron that provides the historical context for the JIF discussion.
Jason Priem, one of the authors of the Altmetrics manifesto pointed to the Total Impact web-site (one attempt to put altmetrics into action) and to a Chronicle of Higher Education article on alt metrics.
But for me, Stephan Schleim provided the most interesting link, to a revealing comparison of means and medians between Nature and two Psychology journals. It provides a great example of the futility of JIFs in their current form and is a must-read#.
There were also interesting commentaries on the post from Telescoper, Bjorn Brembs, DrugMonkey and Tom Webb.
Concerns about metrics
Richard van Noorden wondered if it is the fixation on metrics that is the root of the problem and suggests that article-level metrics are not likely solve the problem. This concern was shared by others (Stephen Moss was at particular pains to emphasis the need to actually read papers when assessing them and their authors) and I would agree that the use of any metric runs this risk. However, the particular problem at the moment is the fixation on a single metric that is of no statistical merit.
David Colquhoun cautioned that there is no evidence that the use of metrics of any kind is a reliable way to judge scientific quality; however, as discussed in the comment thread (e.g. by Chris Chambers), there is no easy way to do a proper experiment on this.
It came to light from comments on the post and on Twitter that some funding agencies and university faculties require JIFs to be included in lists of publications applications for funding or promotion. One commenter was explicitly instructed by his HoD to include JIFs to make it easier for the person assessing his application. There was no keener illustration of the idiocy of the processes to which the research community has to endure. I find myself wondering what the general public would make of this.
Some argued that JIF should at least be maintained as a fair measure of journal ranking but this view ignores the gaming and trickery that goes on (see comments by David Howey and Bob O’Hara), the inappropriateness of a mean for such a skewed distribution and the arbitrariness of the two-year time window which seriously undervalues some journals/fields (see especially the link posted by Stephan Schleim).
Steve Caplan suggested that people will always want to aim for ‘high-ranking’ journals, whether or not they are judged by impact factor. I sense this view is widely held but would like to suggest that the rise of PLOS ONE shows that the landscape is changing. Rather than rely on a journal brand (and suffering the delays of working our way down the rankings during manuscript submission), we should be aiming to get good quality work published quickly and after peer review that only focuses on the quality and coherence of the manuscript, rather than on guesses about likely future significance. The community is well-placed to judge significance after publication; the key — and non-trivial — question is how to go about that.
What alternative metrics (altmetrics) might we use instead of the JIF?
Setting aside for a moment the concerns listed above, no-one (myself included) seemed to have clear ideas on what metrics to use, though there is plenty of development work going on in this area. Some wondered whether tweets or Facebook likes should be counted as a meaningful statistic but there is little enthusiasm for such measures, in large part because they are effortless to make and so run the risk of representing snap or ill-considered judgements.
I think the key will be to gather information from people that we trust. That trust needs to be established over time (as on Twitter) by a consistent pattern of linking to quality material or making thoughtful comments. Development of unique identifiers (as suggested by Cromercrox) might enable us to build a community like the Faculty of 1000 but much bigger and offering greater coverage.
But the whole business of metrics remains rather foggy.
What action should we take?
It was great to have such a vigorous response to my post but I would like all of us who have taken an interest in this issue to make sure that we take action of some sort. Can I suggest that we:
- Keep banging on about the futility and idiocy of using JIFs
- Get funders and university administrators to publicly disavow their use and issue clear guidance to panel members and reviewers. The Wellcome Trust has made a start (see last bullet point here).
- Encourage wider use of open peer-review, where reviewers’ and authors’ comments are published alongside papers; this may foster more online commentary post-publication*.
- Support the adoption of unique identifiers for authors, to be used when publishing or making any online comment on work — this should help to build reputations and trust among the commenting community*.
If anyone has additional suggestions, please add them to the comments.
#Update (20-8-2012; 21:10): But see comment from Pep Pàmies and ensuing discussion.
*These suggestions prompted by a comment from Cromercrox
Some more reactions:
http://phylogenomics.blogspot.com/2012/08/some-articles-on-uses-and-misuses-of.html
and
http://neurodojo.blogspot.com/2012/08/chess-ratings-and-impact-factor.html
The phylogenomics link above provides a useful listing of articles that discuss the use and mis-use of JIFs.
~13,000 pages views and 130 comments! Now THERE’S an impact factor!
🙂
I could not believe that IF (a measure of the average of citations) and the median of citations per research article do not correlate, as has been claimed above (http://www.frontiersin.org/quantitative_psychology_and_measurement/10.3389/fpsyg.2010.00215/full) and elsewhere as well: http://bjoern.brembs.net/comment-n865.html
I therefore made a little exercise and gathered a few data points. See the result here:
http://pic.twitter.com/QVMQkeZg
The correlation is clear, as one would expect. The IF works for journals. Obviously, it does not work when extrapolated to individual publications or to individuals, as amply discussed.
I guess that the author of the contribution at Frontiers in Quantitative Biology and Measurement cited above included in the calculated media all the contributions published by the journal Nature. These include a very significant number of news articles, which are rarely cited, and which specialist journals rarely publish. Instead, in my graph I compared apples to apples.
I don’t think we should get rid of the IF per se. We should however make clear to funders, policy makers and scientists that the IF can’t be extrapolated beyond its meaning: a measure of the quality of research journals according to the citations received by the original research they publish.
But until useful and better metrics (from cleverer algorithms than those we have today) associated to individual papers and to the output of individual scientists become widely adopted, realistically the mistake of extrapolating the IF will be difficult to eliminate.
Hi Pep
Thanks for doing this — I admire your industry. It’s worth noting that Mayor’s analysis also reports a correlation between the mean and median numbers of citations for the two psychology journals that he analyses. The only place where he differs from you on this is in the behaviour of citations to Nature papers. It would be good to get to the bottom of this. I have emailed him this morning to ask if he would like to comment.
Mayor’s analysis also usefully highlights the variations in the temporal patterns of citations for different journals, something that is not commonly reported. In effect, it is therefore masked by the rather thoughtless use and promotion of the JIF every year.
I agree that elimination of the JIF won’t be easy but given its many weaknesses and the widespread harm that it causes because it is so frequently mis-applied, there is an urgent need to do so.
I agree, something looks to be seriously wrong with that FQB figure – median of ~5 and mean of ~100? That implies a seriously bimodal distribution (e.g. lots of 0s, lots of >>>100s). I keep going back the the point about empirical rigour being thrown out as soon as natural scientists start dabbling with bibliometrics – your neat analysis excepted, of course!
Also – Bob O’Hara mounted a defence of arithmetic means, even for skewed distributions, on the previous blog (with which I agree). But I’m surprised those who don’t jump straight for the median, rather than e.g. the geometric mean.
You try calculating the geometric mean for impact factors, and you’ll see the problem. 🙂
I managed to track down Julien Mayor (now working in Geneva) by email and he was kind enough to allow me to post this excerpt of his response:
Or: I Used dodgy statistics to raise awareness of dodgy statistics. Not the strongest of arguments…
Broadly speaking, we shouldn’t let our belief in what they show temper our scepticism of implausible numbers.
Fair point, though it’s worth noting also that Vanclay points out the dodgy process of selection of citing and cited publications inherent in the calculation of the Thomson Reuters Impact Factor.
Nevertheless, you’re right that we should always be clear about methods and measure and I’m grateful to Pep Pàmies and Julien Mayor for helping to clarify this particular issue.
I fully agree with Tom; the use of dodgy statistics can’t be justified.
The contribution of news, commentaries and editorial pieces to the IF of Nature Materials is 5.4% (see my editorial of last year). I suspect it is a similar number for Nature (most likely lower, as it publishes a much higher percentage of news pieces). It is then unfair to include under the same umbrella news articles and original research articles, which clearly have different purposes and very different citation rates. This is why I also excluded citations to review articles from my data (note the mistake in the footnote) for the medians (provided that in Web of Science original research articles, reviews articles and news articles are appropriately tagged).
But am I right in thinking that the Thomson Reuter IF calculation includes news articles and research articles? This is what I have understood from Vanclay’s paper and your Nature Materials editorial. Which would make Mayor’s analysis not out of line with Thomson Reuter’s practice…
That said, I agree that more clearly defined stats are better!
Yes, the Thomson Reuters IF includes citations to news articles in the numerator, but they do not add to the denominator. This has received fair criticism, as the IF favours journals that publish a significant amount of non-primary research content. However, the 5.4% contribution to the IF from citations to non-primary research articles for Nature Materials (I would bet the number is similar for most if not all Nature journals) can be considered a minor contribution.
Mayor’s analysis of the median includes news articles, which strongly biases the result. The 5-year median of citations for Nature is 37, but Mayor’s figure shows numbers around or below 5. This is because the journal publishes a higher number of news items and editorial material than original research papers.
Note: The figure ’37’ mentioned in the above comment applies to original research articles published in the latest 5 years:
http://pic.twitter.com/QVMQkeZg
https://twitter.com/PepPamies/status/237298698118393857
Should we expect to find a transcendent metric applicable to all aspects of scientific contribution? I don’t know. Perhaps the objectivity of scientific pursuits needs to be checked, to some degree, when gauging the ‘impact’ and contributions made by our fellow scientists and science writers. This notion is somewhat mentioned above in regard to social media as a variable in the metric equation. Although we are taught to leave our individuality and personal bias at the door of science, I am sure we all have a subjective affinity for those who we know will spark our imagination and propel us toward our own personal breakthroughs. How do we cite this? The ‘LIKE’ button? When I am inspired by work or concepts presented by people from other disciplines (doesn’t necessarily have to be scientists), how do I make it known where I got that idea from? After all, at that moment it was high impact to me!
I agree with Stephen Moss (above)–people need to read papers to adequately judge their peers. If you want to really know how impactful someone is, read the discussion section of their research articles–do they give you everything they got and inspire you to reflect on your model system and paradigms of interest, or do they play their cards close and remain cryptic?
Identifying this metric is going to be tough, but I think we need to be diligent in keeping track of who is really impacting our own scientific efforts. Maybe then something will emerge.
I have tweeted the URL to this article (to my very few followers), adding an important comment/question. While — as stated in the criticism — the faults of IFs are all but unknown, what can junior and mid-level researchers do here and now? It will surely take time until (if ever) the current anachronistic system changes. Maybe there are some “statistically literate” readers out here who could devise an interim way of picking journals for researchers that need to publish now? Or researchers who know people (who know other people…) able to do that? E.g. I suggest to submit where established standards are met while the alternative metrics’ criteria discussed would also apply? It probably won’t work with the most salient journals, but I am thinking journals IF < 1-10. Call it a "foot in the door" technique…
Please discontinue my subscription, Stephen. I didn’t know how and couldn’t find your email address on the blog. Since postings are moderated, I am hopeful that I’m not adding any distraction to the comments.
I am still disappointed by the lack of responses — everyone seems to be engrossed by the topic itself, the metrics, the stats, review procedures et cetera. I’ve yet to see something pragmatic.
Thanks,
@PhilippKwon
Hi Phil
I’m not in control of subscriptions but if you are getting emails about comments on this blog, there should be a link in there that allows you to unsubscribe.
I only saw your comment this morning and haven’t responded yet because I was still pondering what you meant. Could you clarify?
I agree that your question is a serious one and that an extensive comment thread on a blog will do little to change things. At best it can raise awareness of the problem but the fact that this problem has been with us for some time shows the level of difficulty in solving it. Junior and mid-level researchers are in an especially difficult position since for them to play outside ‘the rules’ would entail significant risk to their careers. As I wrote on the previous post, it is incumbent on senior researchers, funders and university administrations to drive the required cultural change and we need to think about practical measures to achieve that. The smear campaign is only a start.
I like the suggestion of open peer reviews. This is linked to one thing I feel quite strongly about – signing ones name on a review. They’re your comments and opinions, you should stand by that rather than hiding in anonymity!
I kind of agree with this, in that I think openness as a general principle is good. However, there are certainly occasions when not having the option of anonymity would lead me to be less honest in my review. Usually when I know / like / respect the authors (inevitable in a smallish field), but think the work is poor. So, my good reviews would likely get signed, my bad reviews, probably not. That may be a human failing on my part, but then I’m a fallible human being…
Sure, this is perhaps a problem, but so long as people can differentiate between “work Tom*” and “friend Tom” theres no need for any problem. But as you quite rightly say, humans are fallible and unfortunately, people sometimes confuse “work Tom” with “friend Tom”
*I use your name for simplicities sake
*I* struggle to make that separation much of the time 😉
Honestly though, I think I (and, I suspect, many of my colleagues) would decline a lot more review requests than I do if openness were required. Which is a shame.
Stephen, I really enjoyed your blog posts and the comments. Two comments:
Open Researcher & Contributor ID (ORCID) is the most ambitious project to date to issue unique identifiers for researchers. The service will launch in October.
Article-Level Metrics and altmetrics are exciting new developments, but it is proably too early to fully understand what all these numbers mean and how we should use them. Looking at the PLOS Article-Level Metrics data is appears as if Twitter and Facebook numbers measure something very different than citation counts (CrossRef, Scopus or Web of Science) and social media activity is not really correlated with scholarly citations.
Disclaimer: I’m involved in both the PLOS Article-Level Metrics project and ORCID.
Thanks for attempting to summarise the debate following your blog Stephen.
Given the nuanced and broad nature of the discussion about pros and cons of JIFs and bibliometrics, isn’t the summary point “the futility and idiocy of using JIFs” a tad extreme? “Use with caution”, perhaps.
Andrew
No – this is a campaign. Faint heart never won a fair lady, as they say. The more nuanced argument about the relative merits of means and medians is a minor aspect of the discussion. The extent and longevity of the damage wrought on people’s careers and the body of science by the continuing misuse of impact factors is the more important issue and warrants action.
To amplify my reply since you challenged me on this on Twitter: I don’t think it’s extreme, especially given the voluminous response to this post. The abuse of impact factors within the scientific community is a major issue and the vast majority of responses that I have had from the scientific community show that there is deep frustration, a sense of despair, even, that this is a predicament from which we will never find a way to escape.
That said, we have to find a way out and will need both practical measures and both action. Advise simply to ‘use wtih caution’ won’t hack it in my view but publishers can certainly help by prominently displaying health warnings about the meaning and ease of mis-interpretation of the impact factor whenever it is touted on their websites or promotional literature.
I think it would be helpful if they provided links to show the actual distribution that the mean is derived from, indicated the number of citations of their most and least ‘popular’ papers and reiterated the point that, while JIFs are a crude comparative measure of journal performance, they cannot meaningfully be applied to a single paper or individual.
Thanks for the reply. My puzzlement was simply around the extreme conclusions made from a complex topic which doesn’t easily permit a binary ‘JIF good’ or ‘JIF bad’ response. I do agree that publishers could perhaps assist community understand JIFs and other metrics (e.g. SJR, SNP) better. I do try and contextualise it and other metrics in journal board meetings and so on.
Well I didn’t detect that much dissent here from the view that ‘JIFs have come to have awful, pernicious effects. If you can point me to a robust case for them, would be happy to take it on board.
I very much support the idea of unique contributor identifiers. There is ResearcherID but it is cumbersome and its intent is to make ISI data more valuable… Instead, I’d think that scientists would be willing to identify themselves through an open system in which they “sign-off” on their particular contributions/collections. Perhaps in countries where there is a profusion of common names, there are already such tools but it’s in all of our interest to remove confusion. ORCID (see above post) sounds promising but the partner list contains some of the usual suspects: http://about.orcid.org/content/launch-partners-program
All of these issues are very valid, but also ignore the other side of the Impact Factor coin, which is that most journals are now starting to seriously game the system. If you want to see the most egregious ways that many journals game the system, check out a recent post of mine on my blog here. I’m the Editor-in-Chief of the American Naturalist, published by the University of Chicago Press (UCP). Lucky for me, UCP puts no pressure on it’s journals to game the Impact Factor, and so we don’t. However, my position gives me a ring-side seat to watch what other journals are doing.
Mark
Thank Mark – but this issue has not been ignored. In fact in this post I highlighted comments from Dave Howey and Bob O’Hara that specifically mentioned the problem of gaming activities of journals. These should certainly be called out when they are discovered.
I agree with everything that has been written in the original article and summarized in the coda but find it surprising that one aspect was not mentioned (or not included in the coda):
The use of and subsequent dependence on any kind of metric is a direct result of the way academic research funding is currently doled out. Project and position funding is artificially kept low and more or less useful evaluation schemes are then used to decide who gets a portion of an underfunded budget and who, while maybe deserving, does not.
Your proposal basically amounts to accepting that academic research remains underfunded and finding a “fairer” method of allocating the scraps. The alternative would lie in challenging the narrative that publicly funded research should have a lower priority than private R&D. Once the pie grows larger, for instance back to pre-neo-liberal size, metrics for selecting applicants become much less relevant in terms of acquiring funding at all and unless I am mistaken, individual merit within research communities is far less tied to such metrics.
I’m not sure what you mean by the statement that funding is kept ‘artificially low’? I am well aware of the constraints on public spending on science in the UK (where I have actively campaigned for increases) and elsewhere — and that they are particularly acute in these straitened times.
Nevertheless science has always been a competitive business and a degree of competition is healthy, not least to convince our paymasters that we are constantly sifting and comparing the best scientists and the best proposals. The difficulty we have got into, which pre-dates the present economic woes, is the over-reliance on a simplistic interpretation of a flawed metric.
Any government issuing its own currency, which includes the UK government, is in fact not constrained it its spending.
The fact that research spending has gone down, the amount of permanent or tenure-track positions decreased, and competition for funds ratcheted up in the UK and the US (and in Europe even before the onset of the Euro) have everything to do with political considerations. The sparsity of funds is in that sense artificial.
I agree that competition is healthy and necessary but the simple fact that there were much larger science budgets not too many decades ago without competition dying out shows that there’s a different way.
I totally agree that yours is a worthy cause and sorry for jumping in with my comment without checking up on your efforts. But I am afraid that as long as the competition is as fierce as it is nowadays (and the austerity-madness points towards it getting fiercer) any metric will lead to researchers overfitting to it (or gaming it if one takes the cynical view).
So I guess what I am hoping for is that everyone who puts time and energy into discussing what’s wrong with the impact factor and related metrics and how to fix it, puts even more time and effort into lobbying for increased research spending.
hear, hear!
Is it only me, or does someone else see deeper problems lying behind IF and the measuring obsession? My impression is that nowadays science is broken by a highly hierarchical and co-optation system, where a few barons have most of the work done by horde of young wanna-be barons, of which only a small minority will survive. Plus the fact that results judgement is based on stupid metrics and other flaws. The result is that individuals from the hordes bow to barons, don’t argue with them, don’t dare to be mavericks, for the system promotes those who do just small steps ahead, which confirm the mainstream mindset, and penalizes (first of all with lack of positions and grants) those who have the most innovative and risky ideas.
Another result is an exasperated individualism and levels of competition that go as far as to mix with complete idiocy. Everyone fights to come first, to stand out of the crowd, often giving less importance to the value of the work done, or even the fact that you have colleagues who deserve honesty and respect. For instance, take data-exchange standards: so many articles and collective efforts claim that their are promoting a new open format or ontology because it’s better than anything seen before and they’re collaborative and contributors are welcome and bla bla. Reality is that most of projects reinvent the wheel, and contribute to a evil proliferation of similar-yet-incompatible pet formats or ontologies. If you’re a young individual with good new ideas in this field, the chances that you’ll success to spread them strictly depend on the faction you’re able to join. And anyone involved in this kind of research knows the fierce wars of clans that happen behind the scenes. It’s not by chance that we now have says like: ‘the good thing about standards is that there are so many of them’, or ‘ontologies are like underwear, nobody wants to use someone’s else’.
Of course I’m in favor of ditching the JIF and value more nano-publications and crowd-sourcing. Regarding the latter, I think that it’s not simply about counting the number of ‘likes’. It is well possible to measure people’s reputation, based on a range of criteria, from the no. of citations, to contributes in data repositories and open international projects. It’s well possible to weight things better by exploiting social networks to do things like social network analysis and traffic analytics, as they already do for marketing puorposes.
But certainly the point is not only this or that other metric to be used for objective evaluation of scientific value. For there are also other aspects, such as the attitude of people’s to collaborate and leave selfishness a little behind, or prizing the bravery of attempting new routes and pioneering new approaches.
I didn’t ignore it Marco — I referred to the particular difficulties faced by junior researchers in breaking out of the culture of the IF in my original post. However, you are right to draw attention to it. The dependence of younger researchers on the lottery of publication in a high IF journal has been discussed at Occam’s Typewriter and elsewhere several times in the past several years. But it’s a problem that will never be solved until the IF itself is broken.
A commenter who contacted me by email and preferred not to be named brought my attention to a short commentary by Prof Richard Ernst, who was sick of impact factors back in 2010. His piece ends with another rallying cry:
I am post doc researcher in Engineering. This is my opinion about impact factors and journal quality. Everyone in this field knowns what are the best journals. They also happen to have the highest impact factors. It is much more difficult to get paper accepted to these journals than low impact factor journals. There are even so many “fake” journals without impact factors that only want your money and acceptance is easy… For publishing in good journal the authors need to spend much more effort in making the paper high quality. The analysis should be beautiful, literature review part more complete, and more realistic assumptions should be used. Also the writing should be very nice (this means also better paper since there are so many papers around so why should readers waste their time reading badly written papers). All the real researchers I know aim for publishing in the best journals. If you would start counting the bad quality journals as equals to good journals then there is no reason to good research anymore. Also, I know for a fact that good researchers do not want to do reviews for bad quality journals. I also only accept review requests from the best journals, and then I really do intensive review. To conclude, I would say that at least in Engineering the journal where the paper is published is in fact highly important and valid metric for evaluation. The quality required for publishing in those journals is so much more than in bad journals. Of course, sometimes people manage to publish bad papers in good journals also. But also, if you have a good paper why not to publish it in the good journals? Sometimes the papers published in high quality journals do not get many citations. But usually the scientific quality is nice even in those cases. Also sometimes “bad” papers get surprisingly many citations. In order to take into account both of these aspects, I would personal evaluate researchers by combining two factors, 1) number of papers in high quality journals and 2) h-index. These days the h-index is very is to find by using the Google Scholar as authors can make their own webpage and track their h-index and citations easy way. The authors just need to go through the list of papers and add missing papers and remove extras. It seems to find the h-index accurately compared to other databases. Also, there should be enough first authored publications (in Engineering first author is the main author). Thank you for reading.
With respect to metrics, please check http://www.lifescience.net.
We introduced a community-based publication rating system. Also check how we organized the network of institutions and their departments and groups that researchers can connect to.