Rates of Scientific Fraud Retractions

Rates of Scientific Fraud Retractions

Ivan Oransky on his Retraction Watch blog pointed to a paper by R. Grant Steen looking at numbers of retraction and whether they were due to fraud or error. Ivan pointed to a news item on The Great Beyond by Richard Van Noorden looking at one slightly surprising claim in the paper:”American scientists are significantly more prone to engage in data fabrication or falsification than scientists from other countries”. Van Noorden looked at the data in a bit more detail and wasn’t convinced, but didn’t fully run the numbers. So I thought I would.

Here’s the relevant data. The numbers of retractions due to error, fraud, and Unknown are from the original paper (extracted from PubMed for 2000 to 2009, and categorised by Steen). Some of the total publication data is from The Great Beyond: I extracted the missing total publication data (using the same webpage as Van Noorden). I have also combined the “Asia” and “Other” categories, because I wasn’t going to go through and get the data for every Asian country.
(sorry for the very large space that follows)

Country	Error	Fraud	Unknown	Total Publications
USA	169	84	7	1819543
China	60	20	9	185786
Japan	41	18	1	377976
India	27	17	6	95718
UK	36	7	2	350760
S Korea	27	8	3	90052
Germany	22	3	0	294164
Australia	13	3	1	131826
Canada	15	2	0	194777
Italy	11	6	0	201922
Turkey	13	2	0	72615
France	12	1	0	181318
Greece	10	0	2	37094
Iran	9	1	1	19696
Others	235	88	33	2057808

Steen, in the original paper, reported the main country comparisons like this:

The results of this study show unequivocally that scientists in
the USA are responsible for more retracted papers than any
other country (table 3). These results suggest that American
scientists are significantly more prone to engage in data fabrication
or falsification than scientists from other countries. There
was no evidence to support a contention that papers submitted
from China or other Asian nations and indexed in PubMed are
more likely to be fraudulent.

We can see that the first sentence is true: the US produced the most retracted papers. But (as Van Noorden noted), they also produce more papers than most countries, so the others may not be. Steen apparently tried to remove this effect by normalising by the number of papers retracted due to error. If scientists produce papers retractable due to error at a constant rate, then this could be a nice correction, as it would (under a few more assumptions) factor out the rate of reporting retractable papers. But there are some big assumptions in there.
Van Noorden calculated the rate of retraction per paper for the top 7 countries, and came to this conclusion:

But this does not mean that any US scientist is more likely to engage in data fraud than a researcher from another country. Indeed, a check on PubMed publications versus retractions for frauds suggests that s/he may be less likely to do so (though the statistical significance of this finding has not yet been tested).

So, time to answer the question of statistical significance. The statistical analysis is fairly simple (here is the R code, if you want it): the next paragraph gives the gory details so if you want, skip it.
Basically, I assume that each paper has a probability of being retracted, and it is constant for every paper from a country. Because the probabilities are so small, it is convenient to treat the number of retractions as a count (i.e. Poisson distributed), with a rate proportional to the total number of papers (technically, this means using the log of the number of papers as an offset). I then use a Poisson regression, which models the rate of retraction on the log scale.
It*s convenient to plot the results in figures. These are the estimates of the log rate of retraction, with standard errors. First for errors:

The dotted line is the mean rate over all countries. We can see that the US has a comparatively low error rate, indeed the “western” countries (I’m including Japan in this) tend to have lower rates of retraction due to error. The fraud results are different:

The line for Greece is because it didn’t have any errors (the point estimate is -∞ and the estimated standard errors are pretty big too): that can be ignored. We can see that the US has a slightly higher estimated rate of retraction due to fraud, which corresponds to about 30% more fraud per paper than average. But China and India have higher rates of retraction due to fraud than the US (and p-value fans will be happy to know that they are both statistically significance, with lots of stars to make you happy). China has about 3 times as many fraud retractions per paper as average, and India 5 times as many.
What does this mean for fraud and dishonesty? It may not mean that Indian scientists are more dishonest: it may be that they are no more or less honest than anyone else, just they they are caught more often and made to retract. I’ll let others debate that: I have weak opinions, but no more data to back these up.
But Richard Van Noorden was right in his conclusions: the US doesn’t produce the papers most likely to be retracted because of fraud. More generally, one should normalise by the right thing – and also be careful about what you’re actually measuring: it may not be what you want to measure (here it’s not the rate of fraud but the rate of retraction because of fraud).

Reference

Steen, R. (2010). Retractions in the scientific literature: do authors deliberately commit research fraud? Journal of Medical Ethics DOI: 10.1136/jme.2010.038125

11 Responses to Rates of Scientific Fraud Retractions

GrrlScientist says:

November 18, 2010 at 8:57 am

i’d guess this is impossible to figure out, but i am curious to know what the rate for retraction (due to fraud or error) is for those whose names are on a LOT of papers every year versus those whose names appear on few papers every year.
i’d guess that those who publish 25 papers in a year are more likely to retract than those who publish 1-3 papers per year.
Heather Etchevers says:

November 18, 2010 at 9:02 am

Gosh. I am glad I left blogging. The standards keep shooting through the roof. Better than the papers, because I read this – including the methods, and THANK YOU for explaining your assumptions explicitly – and I wouldn’t have read the paper.

Anyhow, as an American working in France, I also wonder about the outlier in the other direction. That is, "It may not mean that Indian scientists are more dishonest: it may be that they are no more or less honest than anyone else, just they they are caught more often and made to retract." – French scientists are not necessarily more honest, but that they are caught less often and made to retract. At an INSERM (my employer) meeting recently, we recruits-of-seven-years learned, most of us for the first time, that we have a Scientific Integrity Delegation that has existed since a particular scandal in 1998, to assist and be a watchdog for researcher conduct.
An interesting subject; thanks for broaching it (again).
Mike Fowler says:

November 18, 2010 at 9:31 am

Great post and analysis, Bob. I liked your attempt to dislodge yourself ever so slightly from the fence, without letting anyone know which way you would fall. Is the average (mean) log-rate really the best point of comparison? How is the raw data distributed?
Another way to present the results is by scaling the rate of fraud by the rate of error. Reseaacrhers in the USA make relatively few retractions due to error, but more due to (caught) fraud. How does this differ across countries?
Any chance you want to extend the analysis to see about rates of fraud across other fields? Business, sports, butchery… The sports one could be fun, but easily confounded. All us upright, honest, hardworking Brits who despair at the ‘simulation’ that those sneaky, underhanded, mercenary euorpeans have brought into football. Bloody frauds, the lot of them. Bring back Chopper Harris.
Bob O'Hara says:

November 18, 2010 at 11:37 am

Heather – you’re talking yourself down.
It is curious, though, that we’re not told what to do about suspected fraud. I’ve no idea if FInland had a system to deal with it, and haven’t heard about anything in Germany either (but I haven’t been here long).
Mike – the original paper scaled fraud by error, which is (partly) how they concluded the US had a higher rate of fraud. But for that to be valid, you need to make a lot of assumptions about how the rate of error changes across coutries. I did that analysis too, but didn’t bother to put it in the blog post: it would get too complicated explaining the interactions I think.
I’m not sure you want to look at fraud in sport: you’ll open a can of works when 15 Scots are accused of fraudulently claiming to be a national rugby team.
Mike Fowler says:

November 18, 2010 at 11:46 am

There’s no way they’re all Scottish. Our fly half is as Australian as a tinny of XXXX.
Snowy? Snoooowwwwyyyyy!
Bart Penders says:

November 18, 2010 at 12:30 pm

Data points usually show up in multiple publications – all of which would have to be retracted when fraudulent behaviour has been proven (the same goes for genuine errors, obviously). Single actions of fraude thus would have multiple results in terms of retractions. If fraud is identified quickly, the consequences would be limited. If fraud is discover late, that same, single act of fraud, would have many many more restractions as a result.
This leads me to wonder whether the sole difference between the countries on the left or the right of the dotted line are more efficient institutional or organisational safeguards against fraud.
JIRA: Nature Network (Rails) says:

November 18, 2010 at 4:45 pm

[NETWRK-2492] Display of blog title formatting in activity stream on home page

When a blogger uses formatting in their blog post title, this is displayed on the activity stream on the homepage and also on the NN blogs homepage (http://network.nature.com/blogs) e.g. see this blog: http://blogs.nature.com/boboh/2010/11/17/rates-of…
Tom English says:

November 19, 2010 at 6:39 am

Great post, Bob. There is a fantastic article on meta-research in the current issue of The Atlantic, Lies, Damned Lies, and Medical Science. It would make a great reading for students at the beginning of various courses.
It happens that I read Feynman’s (1974) Cargo Cult Science, a call to scientific integrity, just today. Fantastic stuff.
The sleaze that goes unreported is incredible. For instance, I review lots of papers on function optimization, and am amazed at the obscure test functions the authors come up with to make their "promising" algorithms outperform some conventional algorithm. I used to give the authors the benefit of the doubt, but now I recommend rejection. This probably makes conference organizers unhappy, because it’s generally "the more, the merrier" for them.
Bob O'Hara says:

November 19, 2010 at 9:53 am

Thanks, Tom. When one of those papers uses active information you know it’s time to retire.
Frank Norman says:

November 20, 2010 at 7:27 pm

It’s tricky trying to decide what is being measured. Is it the individual scientist’s integrity/ability, or the instiutional efficiency and ethical stance, or national standards?
How does the rate of retraction vary with journal readership – perhaps taking impact factor as a rough proxy for readership? Is a fraudulent paper in a widely-read journal more likely to be retracted than one in an obscure journal? Or is it less likely to have got through peer-review in the first place in a more widely-read journal?
Bart’s point is interesting too.
Colin Mathieu says:

February 4, 2011 at 5:14 am

Thanks for the great paper Bob, Its not supprising to see such numbers, nowadays more and more people will just writen complete nonsence in order to generate view, which in turn will generate revenue.
Coming from the gaming community, its quite common to see fraudulent information about a variety of topics, the whole purpose is obviously to get people to buy more.
Colin