Earlier this month Open Research London held a half day event at the Francis Crick Institute to mark Love Your Data week, comprising six half-hour talks. The very engaging and interesting talks were focused on research data discovery, with detours into publishing, preprints and AI. Attendees were well-supplied with coffee and pastries during registration and the halftime break, so there was plenty of time for chatting with fellow attendees. There was also an option to move on to a local pub afterwards to continue conversations and catch-ups. All in all, it was a great chance to learn from and discuss with knowledgeable people.
Here are some notes about each talk. I’ve put them in a different order from the actual programme.
- Dan Crane, from the Open Research team at King’s College
Dan started by describing the setup at King’s and outlining what FAIR data is (Findable, Accessible, Interoperable, Reusable). Then he outlined how his team capture datasets into the King’s Open Research Data System (KORDS) – a repository that uses FigShare.
The team at King’s have developed a metadata template to guide and encourage researchers to add descriptive metadata. There’s a balance to be struck between encouraging them to be as thorough as possible, whilst not nagging them so much that they are put off depositing their dataset. There’s also a balance between getting the researcher to do the work versus the Open Research Team taking on the work. I really recognise that dichotomy. Depositors in KORDS are encouraged to include a readme file and to add their ORCIDs. The system can also capture relationships between datasets. There is a good range of guidance and training material to help users to get their head around FAIR and open data sharing. Researchers are shown how they should reference the dataset deposited in KORDS when creating their data access/availability statement in the paper.
The King’s team also encourages researchers to create metadata-only records for datasets that cannot be shared openly, to give some exposure to this data.
Finally, Dan talked about how they create DOIs for grey literature in their PURE repository. Currently this is done manually. Guidance on ‘how to cite this’ is put on the page of each grey literature document that is posted in the repository.
I wonder how easy it is to get the message out to researchers that creating a DOI for any grey literature that they create is a Really Good Idea? I found that researchers (in life sciences at any rate) have a tendency to stick documents on general web pages without considering that there might be a better way to make them discoverable and accessible.
- Jonathan Green and Julie Baldwin from Univ Nottingham libraries
Jonathan and Julie described the process they use to find research datasets that have been deposited on external platforms by Univ Nottingham researchers, and then to import metadata from Scholix into the Nottingham repository. The aim is to create metadata-only records in the Nottingham data repository. Often research data is deposited into specialist domain repositories and thus is not easily visible or knowable via the university where the authors work. The Nottingham service is based on code shared by Durham/Manchester and uses the Scholix service as a data source.
Jonathan and Julie explained that initially they kept the project small-scale, due to resourcing constraints. The project started as an exploration – running some code to find what datasets existed ‘out there’ and then checking them manually before converting the metadata so it could be imported into the DSpace repository. The process has now been streamlined and further automated.
They had an interesting slide reflecting on some of the challenges and learning points. These boiled down to observing that the world of research data is messy, unpredictable and complex, hence human intervention is needed.
I found it very interesting to see this idea in practice as it’s something I’ve long thought could be useful. You can also import metadata from the EBI’s Biostudies database and I’ve seen this done, but for the purpose of research evaluation rather than for increasing the visibility of the datasets.
- Holly Ranger, from University of Westminster
Holly talked about capturing research outputs from practice research. This kind of research is often non-tangible, and collaborative, affected by its relations with other practice research. Holly noted that existing standards aren’t always suitable for arts research outputs. To improve the representation of practice research in the repository, Westminster has made various changes to the schemas for these. A particular feature is the ‘overlays narrative and context’. Holly said that contextualised data is really important for practice research. Holly mentioned persistent identifiers; RAiD, DataCite DOIs and CReDit. RAiD has proved to be a good fit for these outputs.
Westminster has embedded guidance to making practice research open within the practice PhD research handbook – explaining how to document the practice and research journey.
The second aspect of Westminster’s steps to embedding OA into practice research was implementing ‘Theory of change for research design’. I missed the details of this part of the talk. Holly mentioned the Practice Research Voices project, funded by the AHRC, and its final report and recommendations that have been published.
- Maria Levchenko, from the Europe PMC team at EBI
Maria talked about preprint discovery and preprint review/feedback, focusing on preprints in life sciences. She started with a definition of what a preprint is, and showed the growth in adoption of preprints and of preprint evaluations being posted. She mentioned that there are up to 60 preprint servers that have some biomedical content, and there are more than 35 initiatives reviewing life science preprints. This means that discovering preprints in life sciences can be challenging.
Europe PMC has been indexing preprints since 2018 and now has 735k preprints from more than 30 servers. Of those, 260k have been published in peer-reviewed journals and 10k have some kind of feedback.
Europe PMC also indexes preprint feedback and links them to the original preprint, to help readers assess the preprint. The feedback can be any kind of comment on the pre-published work. Though still small, the numbers of preprint peer reviews are now increasing. Researchers can gain exposure and credit through providing feedback on preprints. ePMC also links into funder and grant information about the research in the preprint, and citations to the preprint. These are all indicators of trust. Maria mentioned eLife’s Sciety website and EMBO’s Early Evidence Base website. Both of these categorise preprint feedback, but their categories are not the same. It would be helpful to harmonise types of preprint feedback.
Maria highlighted the issue of licences for reviews to whether and how the reviews can be reused. For example, can they be translated, text-mined, used by AI tools to provide summaries? Free to read does not mean free to re-use. Hence there is a growing need for pre-print licenses. Subsequently on Twitter EuropePMC posted:
If you want to be part of the conversations to define best practices and community standards sign up here: buff.ly/3uyZC3V
You can check for preprint updates using the Europe PMC Article Status Monitor tool to check if a preprint is:
- Published in a journal
- Available as a newer version
- Mark Hahnel, Digital Science
Mark’s talk was titled “Global Academic Publishing: Where will experimentation lead?” He enumerated some of the qualities we look for in effective academic publishing: speed, openness, cost-effectiveness, trust. It’s hard to combine all four of these. Mark suggested that trust is the most important.
Mark sketched out some of the current problems in scholarly publishing: paper mills, research integrity failures, the volume of research that needs peer review. He pointed out that over the last 20 or so years the amount of academic research published has tripled, but there aren’t three times as many academics. Hence the peer review burden on each academic is increasing, and this is not sustainable. He asked whether/how we can limit the number of papers and datasets that need to be reviewed?
Mark said he doesn’t have answers to these problems, but emphasised that we need innovation in publishing in order to find the answers. He added that innovation can add complexity to the whole system, so it is not always welcomed by researchers/authors.
- Andrea Chiarelli, Research Consulting
Andrew talked about AI’s influence on open research discoverability and impact. He stated that there are many AI tools today and it’s hard to keep up. There’s even a website called ‘There’s an AI for that‘.
AI tools for enhancing search/discovery/review are getting better. Some tools can recommend what to read. Others can enhance research objects with machine-generated metadata, to improve discovery. Other AI tools can help to translate academic language into language that speaks to the policy and practitioner communities that can benefit from research findings. AI tools can also help with trend discovery and analysis.
Andrea highlighted three tools that are worth a look:
He acknowledged that there are drawbacks to AI. It’s a black box – leading to limitations in transparency and reproducibility. It’s difficult to understand the tools and language of AI. There is potential for bias and ‘hallucinations’ with generative AI. There are also data security and privacy concerns.
Finally, Andrea posed the question whether AI is a research partner or a research predator?
He presented the pros (research partner) thus:
- AI becomes a powerful ally for researchers, enabling them to deliver more
efficient, comprehensive and rigorous work.
- AI tools help researchers with literature review, data analysis, hypothesis generation, experiment
design and paper writing.
- Researchers leverage AI to enhance their creativity, curiosity and critical thinking.
- AI helps democratise research, making it more accessible, inclusive and diverse.
and the cons (research predator) thus:
- Researchers lose their autonomy, agency and identity as AI takes over several facets of their roles.
- AI enables a competitive and metric-driven culture, where researchers are pressured to publish even more and faster, sacrificing quality and integrity.
- AI widens the gap between disciplines, institutions and countries, creating a monopoly of research by a
few powerful actors.
- AI tools are used to manipulate, plagiarise, and fabricate research results at scale by paper mills and toxic actors.
A question from the audience highlighted sustainability concerns with using free AI tools: who owns the infrastructure that we become depend on when we use these tools? What “hidden costs” are associated with this? This is an aspect that needs further thought by anyone building services that rely on these tools.