You may have seen some of my #nimrlibrarybyebye tweets. These were a sequence of tweets showcasing books that we have been transferring to other libraries. Each tweet included a photo of a book or a handful of books. I will write a proper post about them sometime soon. The ‘byebye’ in the hashtag is to signify that nearly the whole of the library stock is being disposed of. Transferring books to other collections means that part of our library will live on.
I’ve also been selecting things to keep (as mentioned in my recent Library day in the life post). I’ve focused quite a bit on science history – but that includes recent history such as the early days of the human genome project and bioinformatics. Related to that topic, two things in particular caught my eye down in our store.
Protein sequences – Dayhoff
One of these I’d seen before – it is an book of sequences. My memory told me that we had a small hard-bound book of Genbank sequences, but I must have imagined that. The book is a softbound book of protein sequences, from 1967-68. However, I did remember the author correctly: Margaret O. Dayhoff.
Margaret O. Dayhoff was originally a physical chemist and was one of the founders in the field of bioinformatics. She created this first public comprehensive, computerised and publicly available listing of protein sequences, The Atlas of Protein Sequence and Structure. I think it started out in 1965. Read this biographical article for more details about her.
Mainly I’m tickled by the idea of printing sequences in a book! I love this book because today the very idea of a book full of gene or protein sequences seems bizarre. It shows how naturally we use books to share information. There were more volumes and supplements published in this series in the next few years.
I like the image on the dedication page too – though I’m not quite sure what the origin of the sculpture depicted is.
The book later turned into the protein identification resource (PIR).
There is an interesting account in Nucleic Acids Research in 1981 of another of Dayhoff’s projects – a nucleotide sequence database which became the model for other databanks, such as GenBank.
On September 15. 1980, the Nucleic Acid Sequence Database Demonstration Project of the National Biomedical Research Foundation was made available to interested users through telephone access to our computer. Over two hundred user groups requested access during the ten months of the demonstration. …
We had been using the computer system ourselves for some time and had found that a computerized management system was essential to minimize the overall cost of collecting, updating, and critically reviewing the data.
Margaret Dayhoff was held in some esteem by her peers, and I discovered we also have a festschrift dedicated to her, a special issue of the Bulletin of Mathematical Biology.
While sorting out some old files in my office (I’m doing a lot of sorting out and throwing away these days!) I found a manual for the Genbank online service (GOS), 1992. I thought I must have thrown this away ages ago so was pleased to see it again.
I remember that the GOS was my first direct contact with Genbank. Back then I occasionally had people asking me about gene sequences. I discovered that I could search on Medline for a gene name and then identify the sequence accession number. This allowed them to retrieve the sequence from other sources. With GOS, accessed using telnet, I was able to search GenBank directly in its most current format, and I could even get the sequence too. The interface was plain but no worse that what I was used to in other online systems.
Not long after this Gopher came along, followed swiftly by the WWW (as we called it then). These made it dead easy for everyone to find sequence information and my newly acquired skills with GOS became redundant. Information skills had a high churn rate even in 1992.
Some brave souls wrote books about sequence databases and manipulation – knowing that by the time the book appeared in print another dozen databases and software tools would have been developed. Martin Bishop, a scientist at the MRC Human Genome Mapping Project Resource Centre (HGMP-RC), was better placed than most to keep up-to-date and his was one of the first books on the subject I remember buying for the library .
Other classics were Russell Doolittle’s
Computer Methods for Macromolecular Sequence Analysis, part of the Methods in Enzymology series – vol. 266 in 1996. And Andreas Baxevanis and Francis Ouellette’s Bioinformatics : a practical guide to the analysis of genes and proteins – part of the Methods of Biochemical Analysis series in 1998. These days we have both of these online as ebooks.
Immunological sequences – Kabat
The other book that summoned up memories of the days when sequences were printed in book form was Kabat. I remember the 1987 edition was an enormous book that received quite a bit of use when I first started here in the Library. I was excited when I spotted that there was a new edition in 1991 and went to some lengths to purchase a copy for the Library. That was the last edition of the book as it then turned into a database. See this account by Martin in 1996:
“The chief drawback of this database has been that it has only been available in the form of a printed book. These data have recently become available on the global computer Internet, but no method of searching the data has, as yet, been provided. Here, the development of a specialized database program for accessing the antibody data is described. This database software has been made accessible over the World Wide Web, together with a program which allows a novel antibody sequence to be tested against the Kabat sequence database, to identify unusual features of an antibody sequence which may represent cloning artifacts or sequencing errors.”
I was pleased to see that we have a copy of each edition of Kabat, from 1979 through to 1991, on the shelves in the Library store.
Elvin Kabat died in 2000, and the US National Academy of Sciences published a biographical memoir of him saying he:
was a founding father of modern quantitative immunochemistry together with Michael Heidelberger, his doctoral mentor…
The printed and subsequent Web version [of Kabat] was a pioneering effort that preceded the current GenBank database. Indeed, Kabat was also instrumental in urging the National Institutes of Health to support a national DNA sequence database and the development of sequence manipulation software.
It is salutary to think that the early 1990s were such a different world – no web, hardly any internet, email was just starting to be used. And people thought nothing of publishing gene and protein sequences in paper format.
Nowadays the only reason for printing out sequences is to create a museum exhibit:
A couple of reviews delve more deeply into the history of bioinformatics and computational biology:
- Searls DB (2010) The Roots of Bioinformatics. PLoS Comput Biol 6(6): e1000809
- Hagen JB (2000) The origins of bioinformatics. Nature Reviews Genetics 1: 231-236
- Hagen JB (2011) The origin and early reception of sequence databases. Methods Mol Biol. 696:61-77