Book sequences

You may have seen some of my #nimrlibrarybyebye tweets. These were a sequence of tweets showcasing books that we have been transferring to other libraries. Each tweet included a photo of a book or a handful of books. I will write a proper post about them sometime soon. The ‘byebye’ in the hashtag is to signify that nearly the whole of the library stock is being disposed of. Transferring books to other collections means that part of our library will live on.

I’ve also been selecting things to keep (as mentioned in my recent Library day in the life post). I’ve focused quite a bit on science history – but that includes recent history such as the early days of the human genome project and bioinformatics. Related to that topic, two things in particular caught my eye down in our store.

Protein sequences – Dayhoff

Atlas of protein sequence and structure

One of these I’d seen before – it is an book of sequences. My memory told me that we had a small hard-bound book of Genbank sequences, but I must have imagined that. The book is a softbound book of protein sequences, from 1967-68. However, I did remember the author correctly: Margaret O. Dayhoff.

Margaret O. Dayhoff was originally a physical chemist and was one of the founders in the field of bioinformatics. She created this first public comprehensive, computerised and publicly available listing of protein sequences, The Atlas of Protein Sequence and Structure. I think it started out in 1965. Read this biographical article for more details about her.

Dedication page

Mainly I’m tickled by the idea of printing sequences in a book! I love this book because today the very idea of a book full of gene or protein sequences seems bizarre. It shows how naturally we use books to share information. There were more volumes and supplements published in this series in the next few years.

I like the image on the dedication page too – though I’m not quite sure what the origin of the sculpture depicted is.

The book later turned into the protein identification resource (PIR).

There is an interesting account in Nucleic Acids Research in 1981 of another of Dayhoff’s projects – a nucleotide sequence database which became the model for other databanks, such as GenBank.

On September 15. 1980, the Nucleic Acid Sequence Database Demonstration Project of the National Biomedical Research Foundation was made available to interested users through telephone access to our computer. Over two hundred user groups requested access during the ten months of the demonstration. …

We had been using the computer system ourselves for some time and had found that a computerized management system was essential to minimize the overall cost of collecting, updating, and critically reviewing the data.

Margaret Dayhoff was held in some esteem by her peers, and I discovered we also have a festschrift dedicated to her, a special issue of the Bulletin of Mathematical Biology.

Genbank online service – manual

While sorting out some old files in my office (I’m doing a lot of sorting out and throwing away these days!) I found a manual for the Genbank online service (GOS), 1992. I thought I must have thrown this away ages ago so was pleased to see it again.

I remember that the GOS was my first direct contact with Genbank. Back then I occasionally had people asking me about gene sequences. I discovered that I could search on Medline for a gene name and then identify the sequence accession number. This allowed them to retrieve the sequence from other sources. With GOS, accessed using telnet, I was able to search GenBank directly in its most current format, and I could even get the sequence too. The interface was plain but no worse that what I was used to in other online systems.

Not long after this Gopher came along, followed swiftly by the WWW (as we called it then). These made it dead easy for everyone to find sequence information and my newly acquired skills with GOS became redundant. Information skills had a high churn rate even in 1992.

Martin Bishop’s 1994 book. Guide to human genome computing / edited by Martin J. Bishop. London: Academic Press.

Some brave souls wrote books about sequence databases and manipulation – knowing that by the time the book appeared in print another dozen databases and software tools would have been developed. Martin Bishop, a scientist at the MRC Human Genome Mapping Project Resource Centre (HGMP-RC), was better placed than most to keep up-to-date and his was one of the first books on the subject I remember buying for the library .

Other classics were Russell Doolittle’s
Computer Methods for Macromolecular Sequence Analysis, part of the Methods in Enzymology series – vol. 266 in 1996. And Andreas Baxevanis and Francis Ouellette’s Bioinformatics : a practical guide to the analysis of genes and proteins – part of the Methods of Biochemical Analysis series in 1998. These days we have both of these online as ebooks.

Immunological sequences – Kabat

The other book that summoned up memories of the days when sequences were printed in book form was Kabat. I remember the 1987 edition was an enormous book that received quite a bit of use when I first started here in the Library. I was excited when I spotted that there was a new edition in 1991 and went to some lengths to purchase a copy for the Library. That was the last edition of the book as it then turned into a database. See this account by Martin in 1996:

“The chief drawback of this database has been that it has only been available in the form of a printed book. These data have recently become available on the global computer Internet, but no method of searching the data has, as yet, been provided. Here, the development of a specialized database program for accessing the antibody data is described. This database software has been made accessible over the World Wide Web, together with a program which allows a novel antibody sequence to be tested against the Kabat sequence database, to identify unusual features of an antibody sequence which may represent cloning artifacts or sequencing errors.”

I was pleased to see that we have a copy of each edition of Kabat, from 1979 through to 1991, on the shelves in the Library store.

Kabat – 5 different editions, 1979-1991

A longer history of Kabat appeared in 2000 in Nucleic Acids Research. This described a 30-year history, going back to 1970 when the data compilation first appeared as an article in J Exp. Med.

Elvin Kabat died in 2000, and the US National Academy of Sciences published a biographical memoir of him saying he:

was a founding father of modern quantitative immunochemistry together with Michael Heidelberger, his doctoral mentor…

The printed and subsequent Web version [of Kabat] was a pioneering effort that preceded the current GenBank database. Indeed, Kabat was also instrumental in urging the National Institutes of Health to support a national DNA sequence database and the development of sequence manipulation software.

It is salutary to think that the early 1990s were such a different world – no web, hardly any internet, email was just starting to be used. And people thought nothing of publishing gene and protein sequences in paper format.

A page from the Atlas of protein sequences and structure

Nowadays the only reason for printing out sequences is to create a museum exhibit:

When the human genome is printed out in a series of books, the DNA sequence fills more than 100 books. Image Courtesy: Russ London’s photograph of the Human Genome in the “Medicine Now” room at the Wellcome Collection in London.

A couple of reviews delve more deeply into the history of bioinformatics and computational biology:

Searls DB (2010) The Roots of Bioinformatics. PLoS Comput Biol 6(6): e1000809
Hagen JB (2000) The origins of bioinformatics. Nature Reviews Genetics 1: 231-236
Hagen JB (2011) The origin and early reception of sequence databases. Methods Mol Biol. 696:61-77

2 Responses to Book sequences

Richard Wintle says:

August 27, 2016 at 5:40 pm

What a wonderful piece, Frank. Thanks for that.

I’d forgotten about Kabat! I was bashing away at human VH sequences for much of my PhD, which started in 1990. I don’t recall the hardcopy books but I sure do remember his name and using… I’m thinking maybe a “Kabat database” or similar(?).

I also remember McKusick’s Mendelian Inheritance in Man books… and seeing it first appear on CD-ROM. You could buy a subscription for updates and they’d send you another CD every year, or six months, or something. This was of course before the online version (OMIM) that we all use now.

And among my favourite paper relics of the Human Genome Project era were the Human Gene Mapping tomes, published as special issues of Cytogenetics and Cell Genetics as I recall. Summary articles of mapping progress on each chromosome, and massive lists of the (sometimes inferred) cytogenetic locations of all the known genes. Completely archaic now, but an important document of the HGM conferences. To this day, I’m proud that one gene I mapped on chromosome 1 appeared in these, as of the “HGM 10.5” version if I’m remembering correctly.
Frank Norman says:

August 31, 2016 at 3:02 pm

Thanks, Richard.

Yes, McKusick was another great printed source that turned into a database. I’m sure we have a copy somewhere. It was more text-based than data-based I recall, so less of an oddity as a printed thing. When it became part of the NCBI data-ecosystem it provided a rich source of crosslinks.

[Update 26 Sept: We do have a copy of the 2nd edition, but not the first edition].

It seems that the HGM documents are not open but only available to purchasers / subscribers. That seems a shame.

https://www.karger.com/BookSeries/Home/224113

Comments are closed.

Protein sequences – Dayhoff

Immunological sequences – Kabat

About Frank Norman

2 Responses to Book sequences

Recent Posts

Recent Comments

Archives

Categories

Blogroll

Meta