Ivan Oransky on his Retraction Watch blog pointed to a paper by R. Grant Steen looking at numbers of retraction and whether they were due to fraud or error. Ivan pointed to a news item on The Great Beyond by Richard Van Noorden looking at one slightly surprising claim in the paper:”American scientists are significantly more prone to engage in data fabrication or falsification than scientists from other countries”. Van Noorden looked at the data in a bit more detail and wasn’t convinced, but didn’t fully run the numbers. So I thought I would.
Here’s the relevant data. The numbers of retractions due to error, fraud, and Unknown are from the original paper (extracted from PubMed for 2000 to 2009, and categorised by Steen). Some of the total publication data is from The Great Beyond: I extracted the missing total publication data (using the same webpage as Van Noorden). I have also combined the “Asia” and “Other” categories, because I wasn’t going to go through and get the data for every Asian country.
(sorry for the very large space that follows)
Steen, in the original paper, reported the main country comparisons like this:
The results of this study show unequivocally that scientists in
the USA are responsible for more retracted papers than any
other country (table 3). These results suggest that American
scientists are significantly more prone to engage in data fabrication
or falsification than scientists from other countries. There
was no evidence to support a contention that papers submitted
from China or other Asian nations and indexed in PubMed are
more likely to be fraudulent.
We can see that the first sentence is true: the US produced the most retracted papers. But (as Van Noorden noted), they also produce more papers than most countries, so the others may not be. Steen apparently tried to remove this effect by normalising by the number of papers retracted due to error. If scientists produce papers retractable due to error at a constant rate, then this could be a nice correction, as it would (under a few more assumptions) factor out the rate of reporting retractable papers. But there are some big assumptions in there.
Van Noorden calculated the rate of retraction per paper for the top 7 countries, and came to this conclusion:
But this does not mean that any US scientist is more likely to engage in data fraud than a researcher from another country. Indeed, a check on PubMed publications versus retractions for frauds suggests that s/he may be less likely to do so (though the statistical significance of this finding has not yet been tested).
So, time to answer the question of statistical significance. The statistical analysis is fairly simple (here is the R code, if you want it): the next paragraph gives the gory details so if you want, skip it.
Basically, I assume that each paper has a probability of being retracted, and it is constant for every paper from a country. Because the probabilities are so small, it is convenient to treat the number of retractions as a count (i.e. Poisson distributed), with a rate proportional to the total number of papers (technically, this means using the log of the number of papers as an offset). I then use a Poisson regression, which models the rate of retraction on the log scale.
It*s convenient to plot the results in figures. These are the estimates of the log rate of retraction, with standard errors. First for errors:
The dotted line is the mean rate over all countries. We can see that the US has a comparatively low error rate, indeed the “western” countries (I’m including Japan in this) tend to have lower rates of retraction due to error. The fraud results are different:
The line for Greece is because it didn’t have any errors (the point estimate is -∞ and the estimated standard errors are pretty big too): that can be ignored. We can see that the US has a slightly higher estimated rate of retraction due to fraud, which corresponds to about 30% more fraud per paper than average. But China and India have higher rates of retraction due to fraud than the US (and p-value fans will be happy to know that they are both statistically significance, with lots of stars to make you happy). China has about 3 times as many fraud retractions per paper as average, and India 5 times as many.
What does this mean for fraud and dishonesty? It may not mean that Indian scientists are more dishonest: it may be that they are no more or less honest than anyone else, just they they are caught more often and made to retract. I’ll let others debate that: I have weak opinions, but no more data to back these up.
But Richard Van Noorden was right in his conclusions: the US doesn’t produce the papers most likely to be retracted because of fraud. More generally, one should normalise by the right thing – and also be careful about what you’re actually measuring: it may not be what you want to measure (here it’s not the rate of fraud but the rate of retraction because of fraud).
Steen, R. (2010). Retractions in the scientific literature: do authors deliberately commit research fraud? Journal of Medical Ethics DOI: 10.1136/jme.2010.038125