UK PubMedCentral

I went to another meeting today about UK PubMedCentral . For the first time I began to feel a bit excited about the resource this project is building.
Whenever I hear the phrase “one-stop shop” I groan inwardly. Since bibliographic databases and journals first appeared on the Internet I have heard publishers and database developers talk about their intention to build a “one-stop shop”. To me it betrays an arrogance beyond belief and an ignorance of the nature of information use. Of course it’s an impossible aim. I’m not sure whether UKPMC ever did use that phrase, but I felt that the early information about the project did have a hint of such an ambition, so I thought to myself “here we go again”.
Happily the meeting today eschewed any such notion and I came away feeling that the intention of the UKPMC project participants is to build a truly excellent, useful and usable resource for health science researchers. I also felt confident that the participants have it within them to do the job.
Sophia Ananiadou from NaCTeM explained the work her group has done using text mining techniques on Medline abstracts. This is the third time I’ve heard her talk about this, and it gets more interesting each time. Her aim is to enrich the literature by automatically creating semantic metadata, and thereby to make “undiscovered science” accessible. The MEDIE system is the most vivid example she showed, allowing you to construct a query in the form “subject – verb – object”. For instance, you can ask “what does p53 activate” by searching for subject=p53, verb=activate. Or you can ask “what causes colon cancer” by searching for verb=cause, object=colon cancer. I tried verb=read, object=book but I’m not sure what question that was answering. Currently this MEDIE system is just searching abstracts, but even so it does a pretty good job. It gives a hint of the power of text-mining techniques; I look forward to them being applied to the full-text corpus that is growing in PubMedCentral.
I also enjoyed seeing Peter Stoehr’s demonstration of CiteXplore , that he has developed at the EBI . I’ve heard of it before and looked at it briefly but never properly considered it as a serious replacement for PubMed. Now, partly because of the additional content in it and partly because it is going to be at the heart of the UKPMC search service, I can see that it deserves more attention.
One advantage it has over PubMed is the coverage – CiteXplore indexes about half a million extra references covering plant and animal science (from the Agricola database); plus a large collection of biological patents and abstracts of Chinese biological journals. Other resources continue to be added (e.g. NICE guidelines). The UKPMC project is actively considering other resources to add to enhance the search service.
I was surprised to see that CiteXplore also has citation data. When you display a record it shows the standard bibliographic fields and abstract but it also shows where that article has been cited. And that’s not all: instead of just showing the citing reference it also shows the sentence in which the original article was cited, thus making it easier to interpret the significance of each citation. It would be interesting to compare the number of citations that CiteXplore lists for a given article with the number listed in Scopus and Web of Science, but I’ve not done this.
Finally, CiteXplore has some features that draw on text-mining tools. When you display results you can ask it to highlight proteins in the results. It will then highlight any occurrences of protein names and turn them into links to UniProt. You can do the same for genes or protein-interactions.
Putting all these together and extrapolating the power of the text-mining techniques that Sophia showed, I left the meeting feeling that before long we are going to have a rather special search tool ready to use.
One caveat – all this does presuppose that UKPMC is successful in its aim to gather in the full-text of published research articles. Open Access mandates from the research funders (MRC, CRUK, Wellcome, DoH etc), who are also funding UKPMC, will hopefully help to achieve a high rate of deposition, but it requires the cooperation of biomedical researchers, who have thus far not proved to be very enthusiastic about Open Access. The promise of a better literature search tool may help to persuade them it is worth it.

About Frank

I am a librarian in a biomedical research institute. I've been around a few years, long enough to know that exciting new things fall into the same familiar patterns. I'm interested in navigating a path for libraries as we slip from print through to electronic information resources.
3 Responses to UK PubMedCentral

  1. Martin Fenner says:

    Frank, I’m jealous. I want a PubmedCentral Germany.

  2. Frank Norman says:

    Martin – well I get jealous sometimes of what the Max Planck are doing. Their eDoc server looks good, and the Virtual Library is not bad. I think Ralf Schimmer has put together an impressive range of resources for MPG people.
    NCBI have talked about their willingness to look at other PMC mirrors – I think Canada have some plans. It took NCBI a long time to come round to the idea of mirroring but now their systems can support it I guess they’re happy to work with more mirroring partners.
    In the UK we are lucky to have the Wellcome Trust who have pursued up the Open Access objective, and who badgered the other funders to join with them in funding UKPMC. Such a thing was talked about years before but nothing came of it.
    Maybe another approach could be to turn UKPMC into a broader initiative? Already the EBI (one of the project partners) have a European remit. The BL of course work with other European national libraries. Probably someone somewhere is thinking about this even now.

  3. Stevan Harnad says:

    Central Repositories Like UKPMC Should Be Harvesters, Like Google
    UKPubMed Central (UKPMC) — if it is needed at all (why more than one PubMed Central (PMC)? and why nation-centred content? do we need more than one google-scholar? and nation-centred content?) — should not be a locus of direct deposit by fundees: It should harvest (or be automatically exported to, via the SWORD export protocol) from each fundee’s own Institutional Repository (IR), which is where the funded papers should be deposited directly.
    The universities and research institutes worldwide are the providers of all research output, funded and unfunded, across all fields. Institutional and funder deposit mandates should collaborate, systematically reinforcing one another with convergent mandates to deposit once, institutionally, and export centrally, rather than competing for the still reluctant keystrokes of researchers, most of whom still don’t deposit at all, by requiring divergent deposit, willy-nilly, or multiple deposit.
    Regret that so little is being deposited today? Persuade funders that they lose absolutely nothing, and potentially gain everything, if they simply designate each fundee’s own IR as the default locus of deposit, rather than central loci like PMC or UKPMC.
    A funder mandate covers only a very small portion of total research output, but it reaches into multiple institutions. Institutions are the slumbering giant of Open Access (OA).
    Convergent funder mandates requiring institutional deposit can help gain everything for OA, rousing the Rip Van Winkle from his dreams, motivating each institution to mandate the deposit of the rest of its own research output, funded and unfunded, across all fields.
    In contrast, divergent funder mandates, needlessly insisting on direct central deposit, gain nothing more than what they fund, lose the potential for drawing in all the rest of OA’s target content, and deepen Gaia’s narcolepsy — all for no good reason at all.
        See: *”NIH Open to Closer Collaboration With Institutional Repositories”:*
    “Stevan Harnad”:
    “American Scientist Open Access Forum”: