Genome sequencing, Shakespeare style

Sanger sequence
Some DNA sequence. Each column is one sample, and the four colours are those DNA “building blocks” – A, C, G and T.

Our “genome” is the DNA in the cells of our body. It spends most of its time as an unruly-looking blob in the nucleus of the cell, but packages itself up nicely into chromosomes when cells divide. It’s the “genetic code”, the material of heredity that passes on traits from parents to children.

The science of “genomics”, which is what I spend much of my time thinking about, is about making sense of the three billion or so letters of the genetic code that is written in this DNA. It’s helpful to think of it as text – DNA is a long, thin molecule that is made up of four different “letters”. Imagine a string, strung with four types of beads. Each has a single letter on it, and they’re all mixed up together. These make a one-letter shorthand, based on the names of the chemical units that make up DNA: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). When genome scientists talk about “reading the DNA sequence”, this is all they mean: what is the order of those beads on the string? We use very sophisticated equipment to read it, but really, that’s all it comes down to in the end.

Applied Biosystems SOLiD
A piece of fancy DNA sequencing equipment.

DNA sequence is, take my word for it, terribly boring to look at. Here’s an example – in this case, a piece of a gene that is responsible for making salivary amylase, an enzyme that digests sugars in your food:

TGGTATCTGTACATACCTTTGATGTCAGTGTTTAGTACACGTGGCTTGGTCACTTCATGGCTAA

Doesn’t look like much, does it? Now, imagine three billion letters of this, arranged in forty-six enormous volumes. Those volumes are chromosomes; most people have one each of chromosomes 1-22, and two X chromosomes if they’re female, an X and a Y if male. That three billion letters is roughly equivalent to 857,000 pages of text, or about 28,000 copies of a medium-sized Shakespeare play (say, Romeo and Juliet).

46, XY
Chromosomes. Mine, in fact.

The problem of understanding the genome is that while Shakespeare is written in a language that we understand, using familiar concepts (love, jealousy, betrayal), and words that we can look up in a dictionary, the genome sequence is not. It’s a featureless plain of those four letters. It’s got a great deal of meaning embedded in it though, and much has been done to understand it. While a lot of that information came from complicated, specialized biology, some can be found by comparing one genome sequence to another – in other words, looking at variability between individual people. Just as our outward anatomy (hair and eye colour, height, the shape of your nose) varies from person to person, so does the genome sequence. So how do we find this variation?

Returning to Shakespeare, suppose we have a modern edition of Romeo and Juliet, and we suspect that it might have some typographical errors in it. To find them, we could compare it to a “gold standard” – perhaps the first printed edition, or maybe better yet, one of Shakespeare’s original manuscripts. By comparing the language, we could find errors that change the meaning. Of course, some of them will be obvious. Here’s a very famous line from Act II, Scene 2:

O Romeo, Romeo! wherefore art thou Rodeo?

You don’t need the original to compare with, or even know the play, to infer that there’s probably an error in the last word. Genomics researchers can do the same thing – if you show me part of a gene’s sequence, I might be able to guess that one of those A, C, G, or T changes is a problem. That comes with experience, just like reading and speaking English provides you with the experience to guess that “Rodeo” should read “Romeo”.

Reading the rest of the play would make you even more confident that it’s a typo – there are no references to “rodeos” anywhere else in its nearly 26,000 words. Genome scientists use this approach too, relying on computer programs to find things that just “don’t belong”. Rather than rodeos in Shakespeare, we look for changes in DNA that just don’t occur much (like a “STOP” signal in the middle of a gene). Even without knowing what that gene is supposed to look like, we might infer that such a genetic “typo” would be bad.

Other errors might be a lot tougher to spot, though. Consider this quotation, from right after the first one:

Deny the father and refuse thy name;

Without knowing the play, you’d never be able to guess there’s an error there – the first “the” is supposed to read “thy”. It’s just one little letter that changes the meaning a bit, but it’s hard to spot because either “the” or “thy” makes sense. To find it, you need that “gold standard” to compare with.

This is essentially the same as sequencing my genome, and comparing it to yours. They’re both editions of the same book, and tiny differences can have impacts that are huge (a mutation that makes me sick), modest (a change that gives me a higher risk of being sick), or inconsequential. Recent studies suggest that among the three billion or so letters of our genomes, each of us differs by something like three million single-letter typos, and another 45 million that are rearranged in big chunks (in the wrong place, the wrong order, duplicated, or completely missing). Fortunately, almost all of these don’t seem to have much impact on our health.

We can stretch this analogy even further. Our “gold standard” Shakespeare script is likely to have been pieced together from at least five different Quartos and Folios, which is also how the first human genome reference sequence was made. This reference, still used by most genome scientists, was assembled from sequences of DNA from nearly 750 different sources. It’s still extremely useful, but it’s only recently that complete sequences from individual humans have become available instead. And just as we use annotations in the margins to tell us what Shakespeare meant by “in choler”, or how one might go about hoisting a “petard”, so also do genome scientists use annotations to describe different pieces of that three billion character book – where the genes are, for example.

So there you go. Genomes are like Shakespeare, and variation between people is like typographical errors. Sometimes they’re invisible (suppose I switched the places of the two letter “o”s in the word “too”), sometimes they don’t change the meaning much (“the” and “thy”), and sometimes they’re disastrous (where is that rodeo, anyway?). Using modern genome science, we can find them, if, as Romeo says, we “know the letters and the language”.

Some technical reading

  • Feuk L, et al. (2006). Structural variation in the human genome. Nature Reviews Genetics, vol. 7 no. 2, pp. 85-97. An older review, but still an interesting discussion of variation between people. Fairly technical. You’ll need a subscription to the journal.
  • Khaja R, et al. (2006). Genome assembly comparison identifies structural variants in the human genome. Nature Genetics, vol. 38 no. 12, pp. 1413-1418. One method of comparing two human genome sequences to each other. Very technical. Article freely available here.
  • Levy S, et al. (2007). The diploid genome sequence of an individual human. Public Library o Science Biology, vol. 5 no. 10, article e254. The first individual human genome sequence, in this case belonging to Dr. J. Craig Venter. Quite technical. Article freely available here.
  • Pang AW, et al. (2010). Towards a comprehensive structural variation map of an individual human genome. Genome Biology, vol. 11 no. 5, article R52. This is one paper that shows peoples’ genomes differ from each other in millions of different places. Quite technical. Article freely available here.
  • Wheeler DA, et al. (2008). The complete genome of an individual by massively parallel DNA sequencing. Nature, vol. 452 no. 7189, pp. 872-876. The genome sequence of Dr. James Watson, one of the discoverers of the famous “double helix” structure of DNA. Quite technical, although the box about ethical issues of genome sequencing is an interesting and easy read. You’ll need a subscription for this one, too.
Posted in Education, Guest posts | Tagged , , , | 28 Comments

Reflections on the aftermath of a student protest

I didn’t attend the student protests- except in the sense I walked down the South Bank (other side of the river from Parliament) where my walk home was curiously unencumbered by traffic and I heard the  hum of helicopters over the square itself.

As everyone knows by now the protest turned violent.
student protest sky view

Last night, I took a stroll down to then abandoned Parliament square to view the aftermath. There were no students left but the Westminster clean streets crew were in full force.  I wanted to see for myself if the violence was really that evident as often when you read the news things appear ‘worse’ than they in reality are.

This time I don’t think the media was exaggerating – all of the windows in the treasury that face Parliament square were smashed as were windows on the Supreme Court and all of the phone boxes accessible to the square.  Graffiti was spray-painted on the most of the statues in the square (such as ‘Racist Warmonger’ for Churchill) and I felt kind of sad.

But as citizens of a democracy we DO have a right to protest – this is a fundamental right – the right to assemble.  The right to express anger.

Let me say right now, I don’t condone this kind of violence, I don’t think its right to attack the Prince of Wales’ car – what can Chuck do about this?  He is supported by the state himself and uh do you really want him making policy decisions?

What the violence has done is given the student-protest media attention and reflected how angry, perhaps some of them are. Angry at Clegg (and other Lib-Dems) about the pledge, angry that their fees have trebled.

Is this the right way to go about it?  My immediate response is no, because I think violence is never the answer – and look at the effectiveness of non-violence protests.  Non-violence, such as advocated by Gandhi and Martin Luther King Jr. The Civil Rights movement in the US in the 60s was defined by its relatively peaceful sit-ins – peaceful on the part of the protesters themselves that is, not the police or authorities at the time. An example of a peaceful protest was evident during the student protest yesterday – the Iraq war protesters who have lived in tents on Parliament square for the last 8 or so years were actually cordoned off by the police and seemed to have avoided the fray of students chucking placards and sticks.

On the other end of the spectrum, the poll tax riots in the UK (1990) were in large violent protests – or rather perhaps similar to the current protests in that it was a rally which then turned violent.  Which was reflective of the anger that most of the population I would imagine felt about the blessed Margaret and her silly decision.  But the poll tax, unlike the student fee rise, was a pretty unifying issue, it affected everybody. And this was manifestly unfair and Draconian.

The rise in student fees doesn’t effect much of the population, and many people who don’t go to Uni or don’t send their kids to Uni are perhaps horrified about the violence. I overheard many of these sentiments (which I cannot repeat politely in this post) from others walking around the square last night viewing the wreckage. The principle that students have to contribute to their education was supported by most in the general election in May (both Labour and the Tories who collectively had the vote majority campaigned on this platform) was supported by most of the population.  Its a matter of degree and a matter of perhaps the manner in which it was implemented but it IS a divisive issue.

But what I am struck by thinking about the protests is there is a dilemma.  Trying to gain popular support for stopping student fee increases would go much better by trying to win hearts and minds, but how do you do this?  Non-violent sit-ins on the part of the students go relatively unnoticed by the media (such as the sit-in protest at UCL by students) but violence is hateful and doesn’t win over hearts and minds even from your fellow protesters but does get lots and lots of media attention.

Trying to resolve this dilemma I think is essential now if students want to move forward with protests.  But it is also in part down to the media, it would have been nice if they had covered some of the more good-hearted, funny parts of the student protests, where many of the placards were witty and thoughtful.

funny student protest placard

Posted in Meta | Tagged | 11 Comments

Ah, to heck with the launch date…

…let’s kick the tires a bit, shall we? I’m sure the Occam’s Typewriter admins will cut this out (geddit?) later if things need to be cleaned up pre-official-launch.

Besides, every good science blogging community needs some DNA sequence, surely?

sequencing gel, circa 1995

ATTTCGTACGTAGCTCGTACGTACGTACGTACGTAGCTACGTAGCTAGCTAGCTAGCTAGCTAGCTAGTGCTAGTCGTAGCTAGCTACTAGTCGTCGATGCTACGTAGCTACATCGTAGCTAGCTAGCTAGCTGCTGTACGTAGTAGTAGTCAGTCAGCTAGCTAGCTAG

I’ll get my coat now.

EDIT – I broke it. html image embedding and line wrap DNW. Sorry guys.

EDIT – rpg fixed it again. Isn’t that picture lovely?

Posted in Guest posts, Uncategorized | Tagged , , , | 8 Comments

Welcome

The Occam’s Typewriter Irregulars is a place for guest bloggers to write.

Some of these writers might in time have their own blog on OT; others might simply have been invited (or they asked!) to contribute a one- (or two- or three-) off post.

Enjoy!

Posted in Meta | Tagged | 11 Comments