The Impact Factor Revolution: A Manifesto

Bora has a post up about impact factors that links to a discussion in Epidemiology. It’s the usual stuff: how awful they are because they have all sorts of problems. We all know this, but of course it doesn’t stop us all checking the IFs of the journals we want to submit to, and assiduously following our h-index (if anyone would care to cite this paper or this one I’d appreciate it. Thanks).


I mention this for a couple of reasons. First, the literature on impact factors, h-indices etc. is diffuse: everyone in science has an interest, so they get discussed all over the place: in ecology, chemistry, physics, mathematics etc. It would be nice if all this literature was brought together in a single review. Or perhaps it has, but somewhere obscure. Web of Science gives almost 1500 hits for the term, which will take a bit of time to go through. Connotea has about 170 entries with the tag “impact factor”, which is easier to handle. But if anyone knows of an overview, please shout up! I guess this is also a subtle plea for you to add any other studies you know of to Connotea.
My second reason springs from thinking about Brian Derby’s recent post about the h-index and self-citation. This prodded me to think about what the fundamental problem with impact factors is. My conclusion is (not surprisingly) that we’re all doing it wrong. All the indices and indexes that have been proposed are simple to calculate, so that it is easy to grasp what they are. But it is less easy, or indeed impossible, to understand what they tell us. A journal with an impact factor of 5 is better than one with an IF of 2. But better in what way? Try answering that question without parroting the definition of an impact factor,and you’ll see the problem. It is difficult to understand what exactly they are meant to measure.
The hope is that impact factors will correlate with quality (whatever that is), but although the metrics can be shown to behave in intuitively right way, they can also be shown to behave badly in other ways. But this is all based on some intuitive understanding of how they should behave, and without making that clearer, it is difficult to see how to progress.
I think the problem should be tackled from a different direction, by trying to model the phenomenon of citation. Mathematically, we can think of citations as events, and we want to count these events. There are a set of models called, wait for it, counting processes, which are used in (you’ve guess it) event history analysis. They model the way in which our counts of the number of events increases. For example, they can be used to model epidemics, where the count is of the number of people (or animals or plants) coming down with the disease. The process can be described by how the rate of events changes over time or between different epidemics. The modelling would then be of the rate of citation, and how this changes over time and between journals and authors.
We would expect a better paper to be cited more often, so the overall rate should be higher. But some papers are “slow burners”, only accumulating citations slowly. Others burn brightly, getting lots of citations for a short period before being passé. Judging the relative worth of these two types of paper might be difficult, but we can make progress by model the process of citation. At the very lest we can reduce the problem down to comparing a few parameters.
So what? First, by reducing the citation pattern down to a simpler model, we can estimate the parameters of the model and predict what will happen. This is like predicting how many people will go down with ‘flu when the epidemic is just starting: the pattern is known well enough from previous epidemics that we know how the next one will behave. For citations, we have enough past papers to study that we should be able to get a good handle on what may happen to a paper. There could be considerable parameter uncertainty: a paper good be a slow burner, or it could be destined for obscurity, but this will affect the uncertainty in the predictions, not whether they can be predicted.
So, from a partial pattern, we can predict ahead. From this we should be able to estimate, for example, the total number of citations a paper will get (that’s the area under the rate curve), or the number it will have in, say, 5 years. We would also be able to average over all papers in a journal, or all papers by an author, to give a score for them, and hence get scores for people or journals.
Once we have this sort of model of citation, we can use it to create measures of quality. We can define precisely what our measure of quality will be: the total number of citations for an average paper, the number in 5 years, etc. and then calculate those based on the model. of course, this can be done even if the paper is still being cited, the standard errors are just higher. The advantage is that the quality measures can be defined so that they are understandable: we create ones which measure something that is easy to understand, and then apply that to the model. If you want to compare fast and slow citees, a discount can be applied, so that later citations count less. The discount can be chosen as one wants, and questions of the validity of it can be aimed explicitly at the discounting, i.e. we can see where the problem is in the calculation.
The point of Brian’s post was to flag the problems of self-citation in indices. But for the modelling approach this isn’t so much a problem as an opportunity to extend the model and get another (well cited) paper. Self- and non-self papers can be modelled together, so we can separate out their effect, and also clearly see the self-citers for who they are. The term “marked point processes” will appear in the paper somewhere.
I would argue that this approach lets us define quality in a quantitative way (and argue endlessly over that of course), and then measure it explicitly. We will know precisely what we are measuring and we should be able to understand more easily any un-intended effects on the measure. For example, if we define quality of a researcher by the number of citations per paper they get in 5 years, it will be easier to say “yes but Prof. Trellis’ papers are well cited for a year, but then everyone gives up, where as Dr. Lyttleton’s papers would continue to be cited for decades”. And we can back that up with the numbers.
Sounds great, doesn’t it? There are a couple of problems:

  1. Finding a good model. There are several possibilities, of course, and none will be perfect. They will have different assumptions, and may or may not fit. The challenge is to get something about right but not too complex. But this is true for any statistical modelling.
  2. Getting the data. Someone has to trawl through Web of Science for a few months.
  3. Finding the time/money: The big one! If someone gives me the funding, I could try and hire a student or -post-doc- researcher. This is the usual problem: I also have N other projects I should be working on. Anyone want to collaborate?
  4. Persuading people to use the methods: We can do the modelling, but if nobody is going to use them, does it matter?
  1. Finding simple approximations: I suspect a big problem would be that the computational time will be too long. So, finding something simple to calculate, but strongly correlated with the complex measure would help.

Once all this is done, I will be able to take over the world. Evil laughter is optional.

About rpg

Scientist, poet, gadfly
This entry was posted in Uncategorized. Bookmark the permalink.

19 Responses to The Impact Factor Revolution: A Manifesto

  1. Brian Derby says:

    Bob, There are a number of studies and articles in the literature looking into h-idices and impact factors. These go back over 10 years. A good example from 1997 in the BMJ is here. This covers most of the points that have been reiterated since then.
    There is an interesting and informative presentation here from a publisher that explains impact factors, half lives, immediacy and compares them accross some subject fields.

  2. Bob O'Hara says:

    Thanks, Brian. Even more to read!
    It looks like we have to include the http:// at the start of our links, otherwise the software thinks is is a relative link (i.e. in a sub-directory of this page). This should be your second link.

  3. Martin Fenner says:

    Bob, I suggest the following Nature Network experiment: we (that is those of us interested in impact factors and other bibliometrics) write a paper together. This could be a review or a research paper and I may have some data to be analyzed.

  4. Cath Ennis says:

    I wonder what percentage of papers are found by people trawling through tables of contents, compared to one-off and automated keyword searches. I use a mixture of the two methods, but surely if more people start using keyword searches, the impact factor will start to matter less? i.e. people are searching by scientific content rather than by a specific journal’s reputation. I’ve heard anecdotal evidence of papers in really obscure journals that end up being extremely highly cited.
    (Unfortunately these papers are not mine; a quick trip to Web of Science reveals that my own citations seem to scale with journal impact factor. So much for that theory).

  5. Bob O'Hara says:

    Martin – that’s an interesting idea. Anyone else up for it? The review might be the best way to go to start off with. If I get some time I want to play around with the ideas above, so I’ll have to report back. The problem is it gets technical very quickly.
    Cath – that’s something I hadn’t though about. It might be visible in the impact factors. I wonder if anyone’s looked. The reason you can’t see it in your own papers is, of course, because it’s your papers that are driving the journals’ impact factors.

  6. Cath Ennis says:

    I’d thought of that, but I think it’s highly unlikely that my one paper in PNAS has affected the journal’s impact factor!

  7. Maxine Clarke says:

    Bob, good post: I hope you won’t mind a few “scattered” thoughts on the topic.
    Should we perhaps start a Nature Network group on “citation measures” or “quality indicators”? I occasionally tag and post on the topic at Nautilus and I do go through phases of being good about tagging up Connotea links. But I am afraid it is a bit sporadic. On my wish list of things to do is a forum on quality measures along similar lines to the peer review debate we ran in 2006 – commissioning articles on all aspects of the topic so that all the pros and cons of each can be discussed in a focused way, together with links to good discussions and papers elsewhere. But until that can happen, maybe a Nature Network group would be a place to start a collection of opinions and links? Some of this has gone on already in “publishing in new millennium” and “ask the editor” (and probably others), so you and other users may consider it de trop.
    Only a relatively small number of papers in a journal receive a large number of that journal’s citations, as others have pointed out. Where that leaves someone wanting to choose a journal to which to submit, I don’t know.
    There are various options “out there” or being developed as well as the h index (mentioned above by Brian and also deconstructed in Nature on a few occasions), including a story about this free journal ranking tool called SciImago Elsevier’s Scopus seems good, and is free at the moment I believe (as well as its code). Google Scholar seems to have gone rather quiet since it launched, but that’s another approach.
    I think there are quite a few bibliolmetricians, computer scientists etc making various bibliometric models of citation patterns and so on. I agree the challenge is to get to a point where one is considered an accurate enough measure (and of what)?

  8. Maxine Clarke says:

    PS, Sorry, meant to add, there is a good discussion over at the NN Nature Precedings forum covering some of these (and other) issues.

  9. Martin Fenner says:

    The topic of quality indicators is very interesting and important. It boils down to the questions of what is good science and who is a good scientists. And this of course is relevant for job and grant applications.
    The various discussions we had here and elsewhere indicate a general uneasyness with both Impact Factors in particular, but also with the whole concept of trying to quantify scientific output. The discussions about duplicate papers, ghost authorship, self-citations, etc. are all founded on this central issue.
    How do we make progress in these questions? Blog entries and comments are probably not enough. We had the suggestions of a NN Forum and of writing a paper on the topic. Another (obvious) suggestion is a meeting and there are indeed at least two events coming up:
    Second European Conference on Scientific Publishing in Biomedicine and Medicine September 4-6 in Oslo, Norway
    Nordic Workshop on Bibliometrics September 11-12 in Tampere, Finland.

  10. Bob O'Hara says:

    (right, back to this after the squirrels)
    Maxine – thanks for those comments. I might set up a group, as soon as I can find a logo. I’ll have to look into SciImago more later.
    Martin – thanks for those links! The Tampere meeting looks particularly relevant, and it’s also just up the road.

  11. Martin Fenner says:

    Bob,
    if Tampere is too far, you could also check out the 11th European Conference of Medical and Health Libraries. That’s in Helskinki June 23-28.

  12. Cameron Neylon says:

    Stepping well and truly outside my field of expertise but isn’t this a good case for Bayesian modelling? If the argument is that we don’t really know what is going on then taking the approach of building a ‘most probable’ value of a paper could be rather interesting. I have no idea whatsoever as to how one would go about doing it though…

  13. Bob O'Hara says:

    Cameron – that’s the approach I’ll take when I have time! Actually, the key thing is to build the model hierarchically, with the model for each paper depending on some parameters, and these then depending on the journal, author etc. The Bayesian approach is the nicest way of doing this.
    The main problem will be computational, there is a lot of data an so a lot of parameters to estimate. I’m not sure what the solution would be.

  14. Anna Croft says:

    Heh – based on a student thesis I read the other day, we’ll need to include some chaos theory :)
    Regarding key-word searches, there is the added factor of journal availability. So on a search I am less likely to care about which journal it came from, but if it is too obscure and not online available I am less likely to hunt down the article unless it is truly something special – ie something I cannot get from an alternative source. So my advice to journals seeking a bigger impact factor – be large and already leading and indispensable (eg nature), be packaged with an indispensable journal or be free. And definitely be online, because the next generation after me do not bother if it isn’t downloadable (if it’s not on google, it doesn’t exist).

  15. Cameron Neylon says:

    Bob, I’m glad someone knows what I’m talking about because I certainly don’t! But I think I see what you mean, its quite a neat idea.
    How much compute would you need? Its the kind of thing some of the biblio analysis e-science people here might be interested in and we have some fairly serious capacity available (around a thousand nodes in various clusters in total)

  16. Maxine Clarke says:

    Isn’t one of the problems the basic metric, though? Cath points out some changing practices, but any metric based on a web search or download is highly subject to gaming, various auto web spammy spidery things, etc. People criticise citation based metrics again with reason: one such is that the criterion is used inappropriately by others, and also that the metric itself has flaws (masses of name duplications, address duplications, and others).
    However clever the Bayesian or other statistics used, haven’t you got to have a basic criterion for your analysis, that everyone agrees on and that can’t be abused? (or muddled up).

  17. Bob O'Hara says:

    But, Anna, chaos theory is so 90s! We should be self-organising on networks nowadays, shouldn’t we?
    Cameron – I’m not sure what would be needed. Running a couple of thousand papers won’t be too bad, but it might not scale easily to the whole of Web of Science. I also don’t know enough about what methods could be used: it might be there are some good approximations.
    Maxine – you’re right about all these problems. Duplications and such are a problem for database managers, of course. Gaming statistics is always going to be possible, but some of the simpler methods (e.g. self-citation) could be modelled as well. Of course, it may not be worth it. I guess the message should be “don’t believe the statistics”. That’s a blog post for another day!

  18. Cameron Neylon says:

    Also the minor problem that we’re not allowed the whole web of science to play with. I’m guessing the self organisation may be required to actually do this analysis, whereas chaos may be required to actually understand it.

  19. Maxine Clarke says:

    I asked above if we should start a NN forum and someone has done it! See Citation in Science. I think it is great to have a focus to discuss the issues in Allan’s topic list (at the link). Please join if you are interested in continuing the conversation there.

Comments are closed.