For twenty years we have had some sort of desktop access to the scientific literature. At first we only had abstracts of articles, and accessed them through fairly clunky interfaces (anyone remember BIDS?). The introduction of PubMed in 1997 improved the interface, but still had only abstracts. These days, in most research institutes and universities, the full-text of a huge proportion of the world’s scientific research is available through a simple web interface.

That availability has changed the way we use the literature. Long gone is the idea of the one-off “literature search”, an attempt to provide a comprehensive list of papers on a specific topic. Searching is no longer something you have to sit down and plan. More often we just use a quick search on PubMed or Google to answer a question, such as “what genes affect xxx process”, and may do this many times in a day.

But PubMed and similar systems are not very good at answering questions; they were designed as ways to find articles. In fact the whole scholarly publication system is not set up in a way that facilitates answering questions. Journal articles are blocks of text – descriptions of methods, results and discussion – with images and figures interlaced. They are not machine-readable databases of facts. Scientific literature is not part of the semantic web.

Text-mining and semantic technology sets out to remedy, or perhaps side-step, this. Software like Textpresso has been going a few years and has powered a query system for a limited range of subject domains (including Drosophila). Now UKPMC has launched a new feature – Evidence Finder – that helps you to ask questions across the whole of PubMedCentral – that’s over 400k articles, or 10,004,566 sentences about genes, proteins, diseases & metabolites. This new tool has been developed by UKPMC Labs

UKPMC Labs is a new component of the UKPMC website, which will showcase novel applications based on UKPMC content.

Evidence Finder is not as slick as Textpresso. The results you get come as a list of articles, not an answer to your question. The exact question you ask can affect your results – “affects”, “controls” and “regulates” all give different results. I still have the feeling that I am searching, not mining. On the plus side, on the page of results there is a box with some alternative questions that you might have wanted to ask.

Added 17 Feb 2012

Thanks to UKPMC for pointing out I had misunderstood how Evidence Finder is intended to work. First you should search for the disease, gene, protein or other entity that you are interested in.  Then choose a question from the list on the right, to further refine your search.

The title “Evidence Finder” suggests to me that the tool is aimed at clinical questions and evidence-based medicine, a bit like PubMed’s Clinical Queries. I like the simplicity of the interface, and I appreciate there is a balance between creating a powerful interface with bells and whistles and keeping it simple to use. It is also much harder to tailor an interface that has to cope with a very wide range of topics.

But it is early days and Evidence Finder look promising.  Try it out and give them feedback.

Oh, and try to make sure that all your research goes into UKPMC so that it can be part of the evidence base!

About Frank

I am a librarian in a biomedical research institute. I've been around a few years, long enough to know that exciting new things fall into the same familiar patterns. I'm interested in navigating a path for libraries as we slip from print through to electronic information resources.
9 Responses to Answering searching questions

  1. Heather says:

    Cool! I hadn’t heard of this new tool.
    But I’d rather find evidence and then have to read the whole article, than have a phrase out of context. There exists some tool like this that finds sentences like Favorite Gene regulates This Target (on phone so no link) but I much prefer seeing if I am convinced by how the authors drew that conclusion. Oh yes, a proprietary software called Ingenuity did that, too, with a homemade and expensive database. I suppose it’s good for generating ideas and for seeing if an article made it past your radar that already tested your favorite hypothesis.

  2. Evidence Finder sounds like it’s almost what I want… but not quite.

    What I’d like is a Google Scholar-like tool that hands me the phrase in question in context, and searches everything (or nearly everything) not just PMC content.

    But – what I’d also want it to do is search a non-redundant, proper database with primary database keys (i.e., UID’s) like PubMed does, and do sub-year date ranges (months would be handy).

    Yes, yes, I know. ;)

  3. Frank says:

    Heather, I agree that it’s important to have the links to the whole article too, but I thought it could be useful to be able to see a summary of all the statements made about “x affecting y”, either on their own or in context as Richard suggested.

    Your point highlights the nature of scientific literature as a work in progress, hence we say ‘claims’ made in an article rather than ‘facts’ reported.

    Maybe one day there will be another tool for automatically whether the methods and logic of the authors are persuasive.

    Richard- you, like me, are impatient for progress! Inknow the UKPMC team have done a good deal of work with researchers to see what would be useful to them, so with a bit of luck this tool will develop in the directions you suggest as time goes on.

  4. rpg says:

    I wish I’d read your post on Friday, having spent about two hours searching for the binding affinity of ATP for PKCbeta. But this is very intriguing and is now bookmarked–thanks Frank!

  5. Cath@VWXYNot? says:

    Meh. I’ve had a bit of a play with it today using queries from research projects past and present, and it’s just not finding anything, even when asked something as relatively basic as “which proteins interact with BRCA2?” (Well, OK, it found one reference for that search, but the exact same query on PubMed brought up 72, despite PubMed not being optimised for search terms in that format. Changing which to what didn’t improve matters). “Which human genes use transposable element promoters?” gave me no results, even though I’ve published four examples myself and there are hundreds of others in the literature.

    On a related note, I accessed a couple of PDFs earlier today from Current Opinion in Genetics and Development, and each time got a pop-up message saying “ScienceDirect suggests these related articles”, with links to a few other articles from the same stable of journals. In one case the “related” articles looked quite relevant and interesting, but not in the other case. I didn’t like it – I’d rather find my own papers thankyouverymuch! I guess it could be a useful feature, but I don’t like it being so in-your-face.

    This comment tagged #grumpy, #uphillbothwaysinthesnow, and #getoffmylawn

    • Frank says:

      Cath – that is interesting. It is always hard to test novel searching tools without a real example. From the Evidence Finder home page I presume that it is looking only in the open access articles in PubmedCentral, so that is a much smaller set than everthing in PubMed.

      I just tried it myself. At first I got just one, but then I tried it without the question mark at the end, and got 381! So I think the problem should be readliy fixable.

      As for related articles, I think the feature can be useful if it is choosing articles from the whole literature but if it is just pointing to realted articles in the same journal, or from the same publisher then that is decidedly pointless. PubMed, Web of Science and Scopus all have “related article” links that can sometimes be useful.

      I remember having a trial of a CDROM (that tells you how long ago it was!) of a specialist citation index and showing it to a few scientists. It had a related articles feature. One of our mathematical biologists pointed out that with the number of related articles it was showing for each starting article, you would only need 5 or 6 levels of related articles before you potentially included the whole of the scientific literature!

  6. Frank says:

    UKPMC have responded on Twitter to suggest that we should search for a TERM (e.g. BRCA2), not for a QUESTION. Then refine the results by clicking on one of the generated questions, listed to the right hand side. This does seem to work rather better.

    Apologies for having misled people. On the UKPMC Labs page it has some suggested questions, so I just assumed that one should put in questions to the search box. On the Evidence Finder page it does explain what to do, but I had stopped reading by then ;-)

    • Ohhhhh, OK.

      Just searched for BRCA2. Two suggested questions from the list on the right hand side:

      What interacts with BRCA2? (18)

      What does BRCA2 interact with? (9)

      So, definitely better, but perhaps still in need of optimisation!