On making shit up

Benoit’s comment on Jenny’s blog reminds me of the time that I was scooped, but not because my boss was carefree with data. I’ll tell you about that some other time, though, because there’s an another issue he mentions that’s recently become very pertinent.

One of the issues that crops up occasionally at the day job is retractions. Because we highlight the ‘best’ in the biological and medical literature it’s critical that we do something whenever a paper is retracted, for whatever reason. These can be very good reasons, such as conclusions becoming untenable in the light of new evidence, or irreproducible results; or because of deliberate or accidental fraud or misconduct.

There’s nothing inherently wrong with retractions: it’s part of science’s self-correcting mechanism (although we could all use a little more openness). Either way, when a paper is retracted it doesn’t just disappear: the paper itself and the retraction text remain part of the scientific record. As, indeed, do the evaluations we publish. We (i.e. f1000) need to maintain our records for the sake of posterity; we don’t simply delete evaluations of retracted papers.

Now that’s out of the way, Benoit’s comment about PIs:

the PI’s job in those cases is to talk to the competition, figure out where everyone is at […] and make arrangements for co-publication, for example. It is no accident that a recent issue of Nature had 5 papers on the same iPS cell result, or that the cover of Nature was shared years ago by two articles on the role of lunatic fringe in chick limb development

reminds me of a story that started—at least in public—three years ago and is only just nearing closure.

Not being particularly interested in immunology, my response to the publication in Nature of three papers (by Janssen, Wiesmann and Abdul Ajees et als) reporting the structural basis for the activation of complement C3b elicited little more from me than an “ooh, that’s pretty” together with a raised eyebrow at an 80 Å shift in conformation. (I should add, here, that it’s not surprising to see three such major pieces of work published simultaneously in the same journal. As Benoit says, PIs talk—with each other and with journal editors—and publication can be delayed a little, or hurried. Which I suspect has not a little bearing on the current story.)

I would have paid it no further attention were it not for our soon-to-be departmental head, sometime in the waning of 2007, highlighting a rather intriguing Communication in Nature. Two of the authors on one of those three C3b papers, together with a couple of big guns in protein crystallography (one of whom was responsible for the gazumping I mentioned earlier), put their collective hand in the air and said, “Hang on, there’s something fishy with this third paper”:

We have reanalysed the data deposited by et al. and have discovered features that are inconsistent with the known physical properties of macromolecular structures and their diffraction data. Our findings therefore call into question the crystal structure for C3b reported by Ajees et al.

Them’s fighting words. The Communication gets a little technical (this is protein crystallography, after all), but I’ll try to explain it.
These guys decided, as you might expect, to compare the three structures. A sensible move, especially seeing as the published structures are somewhat dissimilar. Indeed, the Ajees structure is missing a huge chunk of molecule!

The coordinates do not form a connected network of molecules in the crystal lattice. The crystal structure forms layers that are separated by a large void in the c-direction (a slab of about 30–40 Å thick that spans the entire unit cell).

Sometimes it’s actually difficult to ‘find’ molecules in large, complex protein structures—if the software you use doesn’t assign electron density correctly for some reason, or the phases aren’t complete, you can fail to join the dots properly, and in the rush to publication this can be missed. So these guys took the deposited data and tried to solve the structure again, looking for the putative missing protein molecule. And they didn’t find it.

Furthermore, they

noticed other physically implausible features

such as no data indicating that the crystals contained water. In other words, this protein was in a vacuum; which might make theoretical physicists happy but isn’t really consistent with anything we know about biology (and Henry, you can stop the Nature abhors a vacuum joke right now).


Killer figure

They also found that the R factor, essentially a measure of the ‘goodness’ of a protein structure, doesn’t behave you would expect if no water was present in the calculation. Another measure of structure goodness, the B factor, looked odd too. This is a measure of how much any given atom might be expected to move; you might predict that atoms on the outside of a protein have more freedom than those inside: so the B factor should vary along the sequence. Right?

Eh. Right. In the Ajees structure, the B factors are pretty much the same across the entire length of the protein, which when you consider that the structure has vast swathes exposed to solvent (or vacuum, perhaps) is somewhat puzzling.


somewhat puzzling

In brief, the structure is just too good:

We think that these physically implausible features undermine the validity of the model presented by Ajees et al. and the deposited diffraction data from which it derives. Only when the experimental diffraction images are made available can the deviating C3b model be either verified or falsified.

There’s a bit of a pathetic response from Ajees and co., but basically we (as in the crystallography) waited to see what would happen next.

<eats popcorn>

What did happen next was a veritable storm on the CCP4bb mailing list, kicked off by Eleanor Dodson and titled The importance of USING our validation tools. That’s all it took for Randy Read (one of the Big Four) to weigh in with

Originally I expected that the publication of our Brief Communication in Nature would stimulate a lot of discussion on the bulletin board, but clearly it hasn’t. One reason is probably that we couldn’t be as forthright as we wished to be. For its own good reasons, Nature did not allow us to use the word “fabricated”.

That, and the necessity of waiting for Ajees et al. to respond, is probably why it took eight months for the Communication to be published. Anyway, rather than clutter up this space, I commend to you this zip archive of the CCP4bb discussion, kindly collated by Hari Jayaram.

Why am I talking about this now? Because, as reported by Iddo Friedberg, the University of Alabama at Birmingham has (finally) requested that no fewer than eleven structures, represented in nine papers, from the lab that published the apparently dodgy structure be expunged from the Protein Data Bank:

After a thorough examination of the available data, which included a re-analysis of each structure alleged to have been fabricated, the committee found a preponderance of evidence that structures 1BEF, 1CMW, 1DF9/2QID, 1G40, 1G44, 1L6L, 2OU1, 1RID, 1Y8E, 2A01, and 2HR0 were more likely than not falsified and/or fabricated and recommended that they be removed from the public record.

My spy in Sydney tells me that the Journal of Biological Chemistry has retracted one paper, and that the Journal of Molecular Biology and Acta Crystallography. He also says that Birmingham (Alabama) were slow in informing all the parties, which has delayed final actions. I also understand that journals are moving towards requesting that raw diffraction images (even if they’re as bad as Stephen’s) rather than just structure factors be deposited somewhere, so that we can all keep a big, brotherly eye on each other. And finally, “it is interesting that [the head of the lab] still claims he is innocent”.

It’s funny, that two years ago I suspected that Nature received manuscripts from Janssen and Wiesmann in reasonably quick succession, and knowing that the Ajees lab was working on the same thing poked them with a “Have you got that structure yet?”-type call. Alternatively, I thought, perhaps the Ajees group had seen presentations by the other two and realized they were about to be pipped. In either case, I reasoned, they might have had a structure almost ready, or poor crystals, and maybe they made something up.

Not good, not ethical; but possibly, just possibly, understandable. But to apparently fabricate eleven structures? That’s seriously bad, and implies a level of collusion and—well, let’s be frank here—organized conspiracy unthought of outside certain labs in South Korea.

In the meantime, as soon as the news from Alabama hit my screen, I pulled up the f1000 website and searched for the implicated articles. I’m pleased to say that we now reflect the current state of the scientific record (as you
may verify for yourself at this free link).


(Cross-posted)

About rpg

Scientist, poet, gadfly
This entry was posted in Uncategorized. Bookmark the permalink.

45 Responses to On making shit up

  1. Kate Grant says:

    Wah hah hah, tags are hilarious! Article rather poignant too.

  2. Heather Etchevers says:

    Well no wonder that took you a while to incubate. That’s a humdinger!
    The link in the “pathetic response” is missing an integer “1” – here it is.
    Interesting – and painful – story. Not being anywhere close to crystallography, I was blissfully unaware. Alabama has let this person go, and where are they now, I wonder?

  3. Matt Brown says:

    This is a staggering tale. Leaving aside that the authors (allegedly) made stuff up and got it published in Nature and elsewhere, imagine doing all the associated seminars and conference appearances knowing that you were talking bollox to an expert audience. How?

  4. Lou Woodley says:

    Thanks for a great technical explanation of the data – it’s not something I would probably have taken the time to understand otherwise! Really helpful.

  5. Richard P. Grant says:

    Thanks for the positive feedback. And thanks for pointing out the error, Heather–I’ll go back and fix it.

  6. Jennifer Rohn says:

    I think it’s important to note that the alleged falsification was not necessarily deliberate. For a great illustration of how one’s perception of data can be warped by desire, I would refer people to Allegra Goodman’s excellent lablit novel, ‘Intuition’.

  7. Richard P. Grant says:

    That’s very true. In this case it’s difficult to conceive of how such a structure could actually come from anything approaching a physical object. Admittedly structural models are just that–models– but that flat line graph is a real killer: we’re all puzzled how that could come about by accident. It does rather raise the question of what the chuff the reviewers were thinking. Eleanor’s comment about the validation tools (which are really quite sophisticated if loathed by people trying to deposit structures).
    Again, here, it seems to be a mix of a sexy result and pressure to publish. Never a good mix.

  8. Åsa Karlström says:

    wow. that’s an interesting story. Thanks Richard for explaining the backdrop a bit more for people, like me, who aren’t that knowledgable in the R-factors (apart from slopes in general).
    I would agree with your comment Jenny, although 11 structures points to a bigger problem than just “over stating” imho. Clearly, it is something that people are talking about when it comes to some of these amazing finds. How many did look at the raw data files and how many did try and calculate/check/verify the findings?!
    The pressure to publish, maybe even more as a post doc, is always something that will tempt people…. especially now when the N/C/S publications are so crucial for the careers [I’m sure they were important before too but I do think they have gotten even more important nowadays as a sorting tool for agencies]

  9. Richard P. Grant says:

    Essentially, Åsa, you take diffraction images and scan them to get structure factors. These are often deposited and can be checked (although reviewers will usually just check the stats from the next stage). The structure factors are then used to calculate the structure. Now, the critical thing is that, starting with a structure, you can fake structure factors. But you can’t fake the image data. Even then, a good fake is probably more more than solving a structure for real…
    Stephen?

  10. Åsa Karlström says:

    Richard> however, wouldn’t it be like the ever discussed Western blots and PCR bands… if you really want to cheat, you load something else on the gel in order for the right sized band to come up, rather than cut and paste the wrong ladder.
    I was merely wondering, along the lines of making an argument for “an honest mistake” how many people would’ve looked at the diffraction images and done the recalculations… I guess though, from what you are saying, that these images maybe never existed since the data supported proteins in vaccum without water (as in not possible to exist)?

  11. Richard P. Grant says:

    that’s basically it, yeah. It’s not simply a case of splicing in structure factors or electron density like bands on a gel: the analysis by Randy and friends implies that this is what they tried to do, but got caught out by the sheer complexity of the system. You can’t, for example, take some images from a known structure and use them instead. Your Rmerge would be ridiculous. (in fact, some people don’t publish their _R_s. This seems strange to me—see Table 1 at this JMB paper)01474-2).
    Because structures are models of reality, we have to be very careful about our sanity-checking. Which is where the R and B factors come in (they’re also useful for telling us when we’re on the right track in the honest refining of a structure). And if you were to try and fabricate images then… well, you’d have to add random noise and all sorts to make it look real.
    If they really fabricated 11 structures, and no one noticed, they were pretty damned good at it: seems a terrible waste of talent to me.

  12. Henry Gee says:

    (and Henry, you can stop the Nature abhors a vacuum joke right now)
    I’m flatus flattered that you thought I could even follow your exegesis as far as that. But I have a more general question, which, in your lofty Informational Architecture position, you might be qualified to answer, and that’s this –
    Are retractions getting more common, when corrected for the increasing volume of papers? If so, why?

  13. Richard P. Grant says:

    A question I’ve often considered. Not sure if anyone would pay me to find the answer though.

  14. Stephen Curry says:

    I think it is very difficult to see this as a case of accidental fabrication driven by a desire to see what you want to see, (which is a temptation all of us might feel). As was pointed out in the Communication in Nature that Richard links to, the overwhelming impression is that the supposedly ‘raw’ data (the structure factors – basically the square-root of the intensity of the spots in the diffraction pattern) look very suspect. Moreover the packing of the molecules in the crystal is physically unrealistic, as are the B-factors for very exposed parts of the structure. The explanations that the authors offer for these observations are unconvincing.
    As Richard says (a remark that echoes some of the comments in the CCP4bb thread), it would be as much work to fabricate the data set and structure as to solve it in the first place. Indeed to do so, you would have to know at least as much crystallography as a competent crystallographer. This raises the fascinating psychological question of what was going through Krishna-Murthy’s mind. (That said, it is difficult to see how they could have generated a plausible model completely from scratch; perhaps they had poorly diffracting crystals which gave low-resolution information but ‘pretended’ they were high resolution? Not sure – would need to dig into this a bit more).
    I suspect that, to some degree he may have been relying on the relative ease with which one can get such data past the pdb checks and any journal reviewers. Few reviewers would go to the trouble of demanding the structure factors and inspecting maps and models of structures for themselves. They rely on summarised tables of statistics provided by the authors. The CCP4bb discussion points to this and, thankfully, highlights one of the most fantastic aspects of the scientific enterprise – the ability to self-correct. Already there is a vigorous discussion on what measures need to be put in place to prevent the recurrence of such depositions. (This is a set of emails of which the community can be proud!)

  15. Stephen Curry says:

    Oh, and our (real!) diffraction images were beautiful!

  16. Richard P. Grant says:

    I’m sure they were—I just couldn’t find them on NN 🙂 And thanks for your comments.

  17. Henry Gee says:

    I have a story of an apparent fabrication that very nearly got into press – but as it’s a bit long and slightly off-topic, I’ve posted it separately

  18. Kyrsten Jensen says:

    I’d rather have retractions any day (and really, you should admit when you are wrong) over papers with incorrect messages in them. I’m now a stickler for Materials and Methods: incorrectly cited manufacturers, or missing or incorrectly cited catalog numbers, make my job that much harder. I need to know how to replicate so I can help someone else.
    One example: a person called asking if they could get product Y, which was a custom product. No problem! I take a look at the paper myself, and realize that the antibodies the authors cite as a constituent of their custom product are actually not what shipped to that lab. Ever. In fact, if you had ordered the product based on what the authors cited, you’d not get the same experimental results! Thankfully, I knew the group well enough from my time as a sales rep (yes, I worked for the “dark side”) to know that the PI was quite nice, and sent him a quick email his way to point out the discrepancy, in the nicest language possible. He was extremely nice and agreed to let me tell the other person that in fact, they needed to order Z and not Y. He said that as long as it was published, it was in the public domain and he wanted others to be able to reproduce it. of course, this is a major tenet of publishing!
    I can’t say I wasn’t prepared for a fight though – as the person asking about buying the product based on the publication could have been a competitor. I did mention that possibility to the PI, and he basically said “Bring it on!”
    I am still amazed though at how many mistakes can get through the process, mostly by sheer mistake. I’m not talking about Nature articles, of course 😉 Makes me wonder how I was able to replicate any experiments when I was doing my MSc.

  19. Cath Ennis says:

    Kyrsten, that’s quite a mistake! I’m glad you picked it up, and I bet your customer was too (do you guys get bonuses for this kind of awesomeness yet?!)
    I’m ashamed to admit this, but I love reading retractions. Scadenfreude at it’s geekiest.

  20. Richard P. Grant says:

    oh me too! And I especially love misplaced apostrophes!

  21. Cath Ennis says:

    D’oh!
    Can I retract that apostrophe? You can all laugh at me and everyone will be happy.

  22. Richard P. Grant says:

    OK, but as long as you leave the misspelling of schadenfreude.

  23. Cath Ennis says:

    Oh FFS.

  24. Richard P. Grant says:

    Peer review at its finest, this.

  25. Kyrsten Jensen says:

    In order to get bonus’ed for that kind of awesomeness, someone would have to recognize the awesomeness. I file away each and every single positive customer email where they say “you saved my project!” or “you taught me everything I know about X!”.

  26. Cath Ennis says:

    I managed to work there for the only two years in the company’s whole history that no bonuses were paid.
    Good for you on tracking positive feedback! I do the same, and email everything to my Gmail account every couple of months.

  27. Richard P. Grant says:

    Oh Cath. Perhaps it was your fault?

  28. Cath Ennis says:

    Well, I got the maximum possible (i.e. not very big) raise each year, so… unlikely, apostrophe abuse notwithstanding.

  29. Richard P. Grant says:

    Nicely recovered, Cath. I’ll recommend you for a raise any time you like.

  30. Benoit Bruneau says:

    On top of the hastily fabricated/slapped together papers that happen to show up at the same time as others, here’s one delightful story (and a half): a particularly well-known PI received a call from a less “prominent” colleague, who had some results on gene X2, which he suspected Mr Prominent was working on, and asked if he could not work on it (riiiight, that usually works), or if he was, coordinate publication. Turns out Mr Prominent was not working on it at all, but got a couple of postdocs on it asafp, and in record time (less than four weeks if I recall correctly) cranked out a paper that showed up in a journal (beginning with C), along with three other papers….while scooping a fourth group.
    Their previous work on gene X1 was requested by another journal (beginning with S) who had heard that Nature was going to publish something important (yes, I know), and might Mr Prominent have something vaguely similar that he could slap together; again, two weeks later, paper accepted.
    Nice way to get stuff published!

  31. Richard P. Grant says:

    Wow. It would be nice to be that prominent, although I suspect one wouldn’t want to walk down too many dark alleyways.

  32. Richard Wintle says:

    Benoit – most excellent story, that. Now you’ve got me wondering about the identity of Mr. Prominent… I have a few suspicions… 😉
    I am certain that one of the original papers describing microsatellite polymorphisms in the human genome described a hyb mix that included 5% v/v O’Darby’s Irish Cream Liqueur.
    And… AHA! The miracle of Teh IntarWebz has revealed it… right about here
    I’d love to know the story… did it make sense to put the stuff in (condensed milk, a bit of sugar, some ethanol… helpful in hybridization?), or was there some carousing going on in the lab late one evening and some got in there “by mistake”? Or is this a joke on the editors of Am J Hum Genet?

  33. Kyrsten Jensen says:

    @Richard W: that paper is truly awesome! I’ve used all sorts of weird and wonderful things in the lab, but never thought to raid the liquor cabinet (which we had in the lab…don’t ask).
    I think scooping is rather widespread, and people are terrified about it. It’s not entirely unusual for me to work with a customer, but still not have them able to disclose certain details to me in order to help them. The problem is, I need to know some things in order to troubleshoot. I’ve never had to sign a confidentiality agreement, but it’s not uncommon for me to spend nearly 1/2 hr convincing someone that yes, I need to WHY they added bFGF at the concentration they did, or something similar. Really, I tell them that I really couldn’t care less about telling others about their project, I just want them to be able to get it to work.

  34. Darren Saunders says:

    scratching my head
    With so many structures and papers involved, there must have been a procession of students and post-docs involved in this over a number of years… how did they get away with it for so long? And where are they all now?
    As for multiple papers magically appearing in the sam or multiple journals at the same time (ie the 5 papers on p53 in iPS reprogramming earlier this year) it can often be illuminating to look at the initial submission dates on the various versions. There’s a great example I can think of from earlier this year where one of the 3-4 concurrent papers on a particularly “hot” subject had been initially submitted from a smaller lab ~18 mnths before any of the others, which were curiously from high profile labs. My instant suspicion was that the later papers had most likely appeared out of the labs of the various high profile referees in response to seeing the first. Hmmm

  35. Richard P. Grant says:

    That paper is full of AWESOMESAUCE.
    I don’t know, Darren. It puzzles the heck out of me. Actually, I should go and check the submission dates. Be right back.

  36. Richard P. Grant says:

    Author Submission Acceptance
    Janssen 09Jul 21Aug
    Ajees 20Jul 18 Sep
    Wiesmann 21Aug 21Sep
    Those are very rapid, but I don’t know if you can draw any conclusions. Noteworthy that the dodgy paper took twice as long as the others, though.

  37. Sean Seaver says:

    Hi Richard,
    I have been blogging about these structures/papers all week. My initial reaction to the issue was similar to Jennifer’s in that it may have not been deliberate. Sadly, after looking at all the structures, I have a hard time believing they were not fabricated.
    Examples:
    1BEF is from a synthetic source (post), but what has disturbed me more is the time line of events (post). In addition, the B factors for PDB entry 1CMW were derived by a subtracing 16 from entry 1TAQ. The list goes on …

  38. Richard P. Grant says:

    Hi Sean! Thanks for commenting, and the links (I must apologize for not linking to you in the post—I already had a plethora in there and although I knew of your excellent site I was a little pushed for time).
    If anyone wants a more technical explanation of what’s been happening, they could do worse than looking at http://www.p212121.com/ !

  39. Sean Seaver says:

    Hi Richard,
    No worries about the links. I am humbled that you even knew of my site. I felt a little awkward posting links to my site on your blog, but am glad to hear that you found them useful.
    These structures do make for interesting stories

  40. Heather Etchevers says:

    Wow. Here’s another example with lots of human drama. All that’s missing is sex and drugs – and maybe they are not. Those who are behind paywalls are welcome to contact me directly.

  41. Austin Elliott says:

    Via Stephen Curry’s Tweeting, I hear that this has now surfaced as a full sh*t/fan/splatter moment over on Nature News.

  42. Kyrsten Jensen says:

    @austin: there are some VERY interesting comments on that piece now. Check it out. I’m not entirely sure what Mr. Kotwal is going on about…?

  43. Richard P. Grant says:

    Heh. Sounds like it could be libellous to me. ‘Maliciously’? That’s imputing a motive, surely? Where are the lawyers now?

  44. Maxine Clarke says:

    I doubt anyone is still reading, but if anyone is, I’ve “reposted a Nature Correspondence at Nautlius”: http://blogs.nature.com/nautilus/2010/02/protein_data_bank_policies_for.html by the PDB which clarifies their procedures and comments on this particular case.

  45. Maxine Clarke says:

    Oops, here is the correct link to the PDB’s letter at Nautlius.

Comments are closed.