A couple of weeks ago Stephen (of this parish) generated a lot of discussion when he complained about the journal impact factor (JIF). I must admit I feel a bit sorry for the JIF. It’s certainly not perfect, but it’s clear that a lot of problems aren’t with the statistic itself, but rather with the way it is used. Basically, people take it too seriously.
The JIF is used as a measure of quality of science: it’s use to assess people and departments, not journals. And this can affect funding decisions, and hence people’s careers. If we are to use a numerical metric to make judgements about the quality of science being done, then we need to be sure that it is actually measuring quality. The complaint is that the JIF doesn’t do a good job of this.
But a strong argument can be made that we do need some sort of measure of quality. There are times when we can’t judge a person or a department by reading all of their papers: a few years ago I applied for a job for which there was over 600 applicants. Imagine trying to read 3 or 4 papers from each applicant to get an idea about how good they are.
Enter the altmetrics community. They are arguing that they can replace the JIF with better, alternative metrics. They are trying to develop these metrics using new sources of information that can be used to measure scientific worth: online (and hence easily available) sources like twitter and facebook (OK, and also Mendeley and Zotero, which make more sense).
Now, I have a few concerns about altmetrics: they seems to be concentrating on using data that is easily accessible and which can be accumulated quickly, which suggests that they are interested in work which is quickly recognised as important. Ironically, one of the criticisms of the JIF is that it only has 2 year window, so down-grades subjects (like the ones I work in) which have a longer attention span.
But I also have a deeper concern, and one I haven’t seen discussed. It’s a problem that, if it is not solved, utterly undermines the altmetrics programme. It’s that we have no concrete idea what it is they are trying to measure.
The problem is that we want our metrics to capture some essence of the influence/impact/importance that a paper has on science, and also on the wider world. But what do we mean by “influence”? It a very vague concept, so how can we operationalise the concept? The JIF at does this by assuming that influence = number of citations. This has some logic, although it limits the concept of influence a lot. It also assumes that all citations are equal, irrespective of the reason for citation or where the citing paper is published. But in reality these things probably matter: being cited in an important paper is worth more than in a crappy paper that nobody is going to read.
But what about comparing a paper that has been cited once in a Very Important Paper to one that has been cited three times in more trivial works. Which one is more important? In other words, how do we weight number of citations against importance of where they are cited to measure influence?
I can’t see how we can even start to do this if we don’t have any operational definition of influence. Without that, how can we know whether any weighting is correct? Sure, we can produce some summary statistics, but if we don’t even know what we’re trying to measure, how can we begin to assess if we’re measuring it well?
I’ve sketched the problem in the context of citations, but it gets even worse when we look at altmetrics. How do we compare tweets, Facebook likes, Mendeley uploads etc? Are 20 tweets the same as one Mendeley upload? Again, how can we tell if we can’t even explicate what we are measuring?
If someone can explain how to do this, then great. But I’m sceptical that it’s even possible: I can’t see how to start. And if it isn’t possible, then what’s the point of developing altmetrics? Shouldn’t we just ditch all metrics and get on with judging scientific output more qualitatively?
Unfortunately, that’s not a realistic option: as I pointed out above, with the amount of science being done, we have to use some form of numerical summary, i.e. some sort of metric. So we’re stuck with the JIF or other metrics, and we can’t even decide if they’re any good.
Bugger.