Research data – JISC and MRC

The annual JISC conference is a great jamboree of new technologies relevant to teaching, learning and research. It is an event I look forward to each year as there is sure to be something of interest, and every now and then there is something outstanding. At this year’s conference, held last month in London, there was nothing outstanding for me. The two keynotes were both lively and interesting, one (Martin Bean) looking at the changing structure of higher education and the other (Bill St Arnaud) at reducing the carbon footprint of IT, but both were a bit outside my field. I appreciated what Martin Bean had to say about the power of openness – perhaps not surprising from the head of the Open University – and the benefits his university has seen from a big iTunes and Youtube presence.
I enjoyed the session on research data – four short presentations on different aspects of data. Neil Beagrie gave a preview of the second report of the Keeping Research Data Safe project (due end of April). The report includes a cost model of data preservation and guidance for UK Universities. There were some surprises: it turned out that data ingest and data access are the biggest costs; preservation and storage are relatively low cost. The report dealt mainly with centralised data services, not research data from small research teams. Paul Simmonds previewed a report (due in July) on the benefits of Research Data Centres. The report found that a centralised approach is better than a distributed approach, providing faster access to data, more data and better reliability plus training programmes for researchers. I couldn’t help feeling that this was not a big surprise, and not a major contribution to our thinking.
Chris Rusbridge managed to compress the 116-page Blue Ribbon Task Force on Sustainable Digital Preservation and Access into just eight minutes. The report was a US/UK collaboration and has won praise. Chris recommended us to read particularly the section on the economics of data preservation. You can hear Chris talking about it in this podcast. There was a launch symposium for the report in London on 6th May at the Wellcome Trust. I didn’t manage to get to that but the presentations are all online.
Back at the JISC conference, the final talk on data was from Joy Davidson talking about data management planning and a new tool that the Digital Curation Centre (DCC) are developing to bring together a number of other tools. My eyes glazed over at this point because all the tools have acronymic names like DRAMBORA, AIDA, DAF, LIFE and KRDS and they all sound a little bit alike. I guess I am still too far from practical involvement with data management to have a good feel for the usefulness of these tools. I was more impressed with a later demo of another tool from the DCC: DMP Online, a web-based tool which “draws upon an analysis of funders’ requirements to enable researchers to create and export customisable data management plans”. This looked like a useful tool and should be going live later this year. There was also another session from the DCC on data management plans that I missed.
Another session I missed was on the influence of digital content on research but learnt later that there was a fascinating talk on extracting climate data from digitised versions of naval logbooks dating back to the 18th century. Worth a look.
I had another dose of research data earlier last month when I attended a workshop to provide feedback to the MRC Data Support Service. The aim of the project is to define and publish the metadata of the content of some key population health science datasets, and thereby to enable researchers to discover relevant MRC datasets. In essence, to provide an online resource to give a better idea what is in some major datasets.
They have made some good progress and aim to launch at the end of 2010. I was interested to learn that some of the datasets include clinical, biochemical and genetic data as well as responses to general lifestyle and health behaviour questions. Confidentiality is a major concern, so only the metadata is being provided, and even then not all of it as some times even the questions asked may be revealing (“when did you stop beating your wife?”). Different levels of access will be granted to different communities, and defining these is one challenge for the project.
The most fascinating aspect for me was the challenge of ensuring the metadata was meaningful, and providing sufficient structure. For instance, if the second part of a two-part question is “If so, what?” then it is quite meaningless if not closely tied in to the first part. Capturing this kind of structure is important. Some of the datasets extend over several decades, and the same question may have been asked in slightly different language over the years. Again, a way to link these slightly different questions together is needed. There are many wrinkles to making sense of these large collections of data.
The project is being funded by the MRC and carried out by STFC plus University of Oxford and UCL’s CHIME. I think this will be a great resource and look forward to seeing the final service.

About Frank Norman

I am a librarian in a biomedical research institute. I've been around a few years, long enough to know that exciting new things fall into the same familiar patterns. I'm interested in navigating a path for libraries as we move further from print to electronic resources to open research, and become more embedded in research workflows.
This entry was posted in Research data. Bookmark the permalink.