For most normal people, a Sunday afternoon in London might find you down the pub having a lovely roast dinner, a lukewarm pint and a chat about the torrential winter rains or the rugby. But not us geeks: we’ve got standards to maintain. It’s not just a Monday-through-Friday, nine-to-five job – neither rain nor snow, nor sleet nor dark of night shall stay these quirky obsessives from the swift completion of their appointed rounds.
This past Sunday, Eva, Richard and I were sitting around my Docklands flat. Eva, on a flying visit from Toronto, had just about dried off from her swim from the Tube station and was getting her strength back with a bowl of pasta. I was trying to hold up my end of the conversation while finishing up an opinion piece for Nature, and Richard was mixing things up a bit by fiddling with his iPhone instead of his laptop. Plans were afoot for interviewing Eva for a LabLit podcast down at the local watering hole later about something suitably sci-lit geeky.
I’m not sure who brought up the topic first, but Eva mentioned she’d like to do some sightseeing in Cambridge, and Richard told us about the BRCA2 Cycle Path, part of the National Route over a mile between Addenbrooke’s Hospital and Great Shelford. As a public engagement exercise, the path had been painted in 2007 with about 10,000 stripes of four different colors, each representing a nucleotide base and collectively spelling out the entirety of the gene BRCA2. Residing on chromosome 13 and associated with breast cancer in mutated form, BRCA2 was sequenced at the nearby Wellcome Trust Sanger Institute, who sponsored its enshrinement on pavement.
What exactly, we wondered first off, was meant by ‘gene’? Richard googled up a photo of the start of the cycle path, and we gathered round the screen:
Green, red, yellow, blue, blue, red, green, red, red, yellow, yellow…
Eva and I reckoned that the logical choice for the designers would be the beginning of the coding sequence; therefore the first codon would be methionine, making green an A, red a T and yellow a G (and blue, C by default). I voiced my doubts straight away: most Kozak consensus sequences place a G after the initial ATG, and that didn’t fit the pattern. Plus the size of the protein would be monstrous: ten thousand base-pairs /3 bases per codon = 3.6k amino acids x 110 daltons/aa = 396 kD – a huge protein. This cheered up Richard considerably: due to his background, he is quite mRNA-ocentric, forever going through life biased towards consideration of untranslated regions.
Abandoning my Nature piece, I navigated into the NCBI Entrez website and pulled out a representative messenger RNA sequence; the protein was indeed that huge, but I failed after only a cursory scan to see a start codon that matched the pattern (and was too lazy to put some welly into it). Eva pointed out that there were 4! (4x3x2x1 = 24) possibilities matching colors to bases, so searching for all the permutations by hand just might be too much hard work even for geeks on a Sunday. Richard transcribed the first 20 stripe colors from the photo and performed a Blast search on the assumption that the first stripe was an A, but didn’t get any hits in BRCA2. When Eva suggested he try all 24 possibilities, Richard briefly considered, then abandoned, writing a Perl script to do it for him. Instead, he had a search through Entrez, pulled up another mRNA sequence and searched for the presumed starting-with-ATG string, not by eye but with Apple-F, until he found an ATGCCTATTGG that matched the cycle path colors.
Another job successfully completed – but a geek’s work is never done. Borrowing just a little from Jane Austin, if anyone has any other cycle paths that need decoding, please send them on, for we are quite at leisure.