Solo Hackday

Once upon a time I might have described myself as a techie. My career was founded on my willingness to install hgopher and Trumpet Winsock and fiddle with autoexec.bat and config.sys. This gave people access to the wonders of the internet back in 1992.

But then things got complicated and I realised I wasn’t a real techie. Sure I could edit raw html using notepad and I could do basic CSS, but I couldn’t write a Perl script to save my life. So I lost my techie badge and moved onto negotiating journal licences, managing staff and budgets, and other simple tasks like that. Part of me still yearns to be one of those ‘can do’ techies though.

Thus I could not resist the invitation to go along to the SoLo Hackday (I still think of the event as Science Online London, or SoLo, but it is properly called SpotOn now). The advance publicity said they didn’t just want coders but a mix of people so I thought I’d give it a whirl. There were about a dozen people at the Hackday. Each of us expounded our ideas for projects that we would like to work on and we divided into three groups. I was paired with Mark Woodbridge, a bioinformatician from Imperial College.

I have felt for some years that current awareness (finding out new stuff that you want to read) is a mess. It should be possible to have a sophisticated service that ‘knows’ your interests and brings things of interest to you, allowing you to save them or bat them away, rather like Twitter does. We talked about this for a bit and realised that we would probably need six months and an enormous development team to produce something worthwhile, so I scaled back my ambitions a little.

People clearly want to find new stuff that is directly relevant to their research interests, but I think it is important for people also to have a view of what is happening out in the suburbs of their subject. Some might say this is a luxury and no-one has time to read outside a narrow core, but I still maintain that breadth is necessary, especially in an Institute founded on an multidisciplinary approach. I want to find ways to produce a regularly updated reading list of potentially interesting papers. A very simple way is to use other people’s judgements. Certain journals – e.g. the Nature-branded titles – have a “News and Views” section in each issue with commentaries by experts on a small selection of papers from that issue. An editor has selected some papers as being of particular interest, and commissioned knowledgeable experts to write about them.

I thought it would be good to aggregate all those news and views type pieces into an RSS feed.

Step forward Research Views. We nearly stumbled at the first hurdle as we needed to give our project a name. “Research Views” is not a perfect name but we didn’t want to waste time agonising over it. Hackdays are all about compromise – getting something done in a short space of time.

In principle we could achieve what we wanted by filtering and merging the journals’ RSS feeds, but they are not all sufficiently rich in detail (i.e. they do not all indicate which articles are in the News and Views section). It might also be possible via PubMed, since they have an article type ‘Comment’ which includes these articles. But there is a delay of a few days to weeks getting into PubMed, and longer still before articles are all fully tagged. Further, the ‘Comment’ article type seems to include other kinds of comments, not just the “here is an interesting article” kind of comment. So we needed to build an app.

Mark recommended using Google Appspot to host the app since he had used it successfully for other projects. It took a little while getting things aligned between his Linux box and appspot but then we were ready.

First off we looked at Nature Publishing Group as I suspected their data would be good quality. I know they have done a lot of work to bring all their primary branded journals up to the same standard. It turned out that their RSS feeds are pretty good and it was relatively straightforward to extract what we needed. Mark used something called XOM to do this. There were some minor inconsistencies, and a couple of the journals caused problems so we excluded them for the time being. Before too long we had a web page with a list of Nature-branded journal titles each with a tick box. Choosing some journals and pressing “Submit” generated an RSS feed of News and Views articles.

Next we looked at the Cell Press journals. The RSS feeds here were very thin, and did not reveal which were commentary articles. However, the journal issue contents pages had a good deal of structure. Using Jsoup, with a good dose of persistence and trying out, Mark was able to fish out the relevant information. We found that the journals were not all quite consistent, and since we were running out of time by that point we only included four journals from this publisher initially.

We had spent not quite four hours on the task and had an app that could splice together commentary articles across two publishers, using different techniques. Later on, Mark added three journals from AAAS, the publishers of Science. I hope we can sort out a few more Cell Press titles, and add commentaries published in the PLOS journals too. Maybe even eLife.

I learnt something from the exercise. Looking at the XML and spotting the structure is something I can learn to do, but building a set of commands to carry out a task is still a mystery to me. It was (mildly!) exciting to work on something and see it actually take shape and even work as intended! I am very grateful to Mark for sharing his skill and for his persistence.

So, ladies and gentlemen, we present – Research Views.

Once you have made your selection of journals and clicked “Submit” the resultant RSS feed will automatically fetch any updates whenever you open the feed. In some web browsers the RSS feed does not display nicely, so you may need to fiddle around to add it to your RSS reader.

We identified all kinds of improvements that might be made. But I am not even sure whether it is remotely useful or not. Maybe it is easier doing it through PubMed. Maybe publishers will enhance their RSS feeds to show commentaries. Maybe people don’t need to know about these commentaries.

One other thought we had is that being the subject of a commentary is in effect a badge of honour for an article. I don’t think this is reflected anywhere in article level metrics.

About Frank

I am a librarian in a biomedical research institute. I've been around a few years, long enough to know that exciting new things fall into the same familiar patterns. I'm interested in navigating a path for libraries as we slip from print through to electronic information resources.
This entry was posted in Research tools, Scientific literature. Bookmark the permalink.

4 Responses to Solo Hackday

  1. Oh ye gods, I remember Winsock and Gopher both. Never needed to install either, thank goodness, although at one time I was a dab hand at tweaking DOS autoexec.bat and config.sys files.

    As Argus Filch says in the first Harry Potter movie, “God, I miss the screaming”.

    Well done you for diving back into this. And let me reassure you that it is almost certainly not easier to do this through PubMed, based on my limited knowledge of PubMed, and my knowledge of its limitations. ;)

  2. Frank says:

    Well, I have dived in, but only as an assisted dive. Very much a duo hackday rather than a solo hackday.

    I think the world will always need techies, so that is an incentive to gain some extra skill in that department.

  3. As I mentioned at the time, there’s some pretty simple GUI ways of combining & parsing RSS feeds e.g. using Yahoo Pipes. Admittedly your app is a neater, less kludgy way of doing it.

    However should anyone want to do something similar for a completely different set of journals / RSS feeds this might be worth investigating? I describe it a little here: http://rossmounce.co.uk/2011/10/11/research-tips-tricks-creating-rss-feeds-and-filters/

  4. Cath@VWXYNot? says:

    Very cool! I already subscribe to the full TOC RSS feeds for all the journals on the list that I’m interested in, but I would definitely use something like this if I ever decide I don’t need to read primary research articles any more, or if the list of journals expands to include ones I don’t already read.