Open Knowledge Foundation workshop

I recently attended a workshop on open scientific resources organised by the Open Knowledge Foundation (OKF). The OKF blog has some Notes from the meeting, but I present a few of my own reflections on the day here.

The definition of open data was new to me. Providing free access via a web interface does not constitute openness. To be open a data resource must require no payment and no logon; it must allow redistribution and must allow the full dataset to be downloaded without barriers. This may seem like a tough definition of “open” but the point is to make it as easy as possible for data to be used and reused. There was quite a bit of discussion about licence conditions and the way that these can interfere with data use. Some prefer the use of “community norms”, i.e. non-legal mechanisms (one person suggested a curse!). Crucially, scientists wanting to make data available should also make their intentions about rights to use clear. Uncertainty over rights to use a dataset creates a barrier to use of the dataset. The Open Knowledge Definition has more information on all this.
Practical ways to help achieve openness were also discussed: guidelines for making data open / opening up data; advocacy for open data; expanding the work of editors and curators.
It was interesting to see possible solutions emerge from the discussion, though it was clear that much has already been done by the Open Knowledge Foundation and other players (e.g. the Science Commons). The OKF’s Comprehensive Knowledge Archive Network is a repository of datasets that are open, according to the above definition. This is a good base for further action – such as a proposed “unlocking service” that helps people to request a dataset to be made open. It was also proposed to create a simple recipe for making a dataset open.
Advocacy is needed in order to embed openness in scientific culture. There is a need to educate students, particularly at post-graduate level, and to engage with both research funders and publishers. Publishers will not want to take too much of a lead in this but should be happy to help enforce community norms once those are settled. But the need for advocacy also extends to the software industry and instrument makers – if their products output data in non-open formats that will create problems for those wanting to reuse that data.
There is a need for better recognition of the work of data packagers, data curators, and others who work with data. This to some extent echoes the points made in the recent JISC report that I blogged about previously. It was suggested that there is a need for data packagers, akin to the open source software packagers such as Debian. I wasn’t convinced by this, but time will tell. OKF hope to recruit more people to help curate and expand the CKAN registry.
Cameron also mentioned the concept of the fully supported paper and gave a number of examples of steps to achieving this. Cameron’s Science in the Open blog is a good source of more reading on this.
It was a highly interactive workshop – essentially a free discussion with just a few nudges in direction provided by the organisers, Jonathan Gray and Rufus Pollock. The presence of open science gurus such as Cameron Neylon and Peter Murray-Rust and Tim Hubbard from the Sanger Centre ensured a core of scientific data expertise. Also present were one or two scientists and researchers, a publisher, a data archive manager and a couple of other librarians besides myself.

About Frank Norman

I am a retired librarian. I spent 40 years working in biomedical research libraries.
This entry was posted in Uncategorized. Bookmark the permalink.