Never mind author contributor ID, or DOIs for articles, or whatever (I can’t be bothered looking up the links): I’m currently trying to find correct names for and de-duplicate entire journals.
there must be a better way
I have to match up all occurrences of a journal’s name, including misspellings and tyops, in our database and correct them to the canonical abbreviation. For further enjoyment I’d like the URL of the journal’s main page, where one exists.
PubMed, frankly, is a bit crap at finding journal names and their homepages. Anyone know of a good resource? Preferably one with an API or at least a script-friendly interface.
In the meantime, my favourite journal so far is
Meded Rijksuniv Gent Fak Landbouwkd Toegep Biol Wet
closely followed by the laconic
Pain.
You mean, you have to do this by hand?
I see it all, now – ‘Information Architect’ is one of those euphemistic jpb titles, like ‘Recycling Aggregation Engineer’ (dustman) or ‘Imperial Grand Mekon, Galactic Emperor And Absolute Ruler Of All Living Things’ (_Nature_ editor).
Seriously.
How would a computer know, for example, that
Nat Struct Biol
and
Nat Struct Mol Biol
are the same journal? I only know because I published in it and watched the name change.
And in the example above, you might be able to write a program that identified the three instances of J App Cryst, but would the same program be able to tell that Mol Cell and Mol Cells are different journals? I thought that was a mistake until I looked them up.
Hence the request for an online resource.
Or even that ‘Neuorn’ is a tyop of ‘Neuron’?
The Information Architect’s job is to make sure these mistakes do not occur in the rebuild of the input tool, but I have to fix the fubars that already exist, too.
Like all Dutch journals, that one you mention sounds like something a Dutch person might yell after hammering his thumb during DIY.
What you need is a sort of Google-esque “did you mean…” spelling approximator built into your dedupe routine. Then it can compile all the similar ones and ask for human input at the very end. Can’t one of your techy people set up a macro or something? I got one to do something like that when I was text-mining.
Journals are hard to find sometimes. I find Wiley InterScience to be the worst: Google invariably leads you to the WIS pseudo-homepage that doesn’t let you do very much, and it’s very difficult to find the link to the real journal homepage within all the corporate mumbo-jumbo.
Isn’t that more or less the definition of Dutch?
You’re right, that is exactly what I need. However, not only are the techs already flat-out working towards the site relaunch, but you’d have to populate the dictionary, and then still check everything to see if the suggestion is really what we mean. We did this morning discuss a type-ahead type thing for the next iteration of the site (and not allow people to enter anything that’s not canonical) but again, this task will have to be completed first.
Best get to it, then!
I have a very similar problem in my journal database, which features input from many different bibliographical sources. (Incidentally, do you know who made the decision to UPPERCASE ALL JOURNAL TITLES ON WEB OF SCIENCE? There is a special circle of hell for them.) My policy is to keep all journal titles in their full form and abbreviate for medical publications using a giant search-and-replace script.
Some links I found useful:
Biological journals and abbreviations
Medical journal list, very script friendly
How friendly are F1000 with Thomson? They should maintain lists of who merged with whom for all the glamourmags for which they compute impact factors.
Heh. Thanks Maria, those links look shiny.
You could try the CODEN or ISSN. CAS administer CODENS from here – along with ISSNs and ‘official’ abbreviations. (Its all described here on Wikipedia
There is also a short list (~1500) at the CAS website
Ah… we’ve got a list of abbreviations, not useful things like ISSNs. And two and a half thousand non-chemical journals.
Thanks for the thought, though…
Maybe talk to someone at Suncat?
Hi Richard, sounds like you need Named Entity Recognition . The stuff that text-miners get excited about. Its an “active area of research” – which as you probably know, means most of the available software isn’t very useful just yet…
snort
Yeah. I’m the named entity, and I don’t recognize a bloody thing.
Oh! Just realized that Nature PG has a lot of these guys, with lots of lovely URLs: http://www.nature.com/siteindex/index.html
Wish I could help. I always redux to Google et al.
This kind of issue is exactly why, as we populate our database, or define ontology underlying metadata, users are given drop down menus for data entry.
is my mantra.
I’ve just had that conversation with my head Developer. He’s a good bloke. I have to say that, my job depends on him.
If this can be of any help:
Meded Rijksuniv Gent Fak Landbouwkd Toegep Biol Wet
=
Comm Agr Appl Biol Sci Ghent Univ
And Jenny, you’re absolutely right. The Dutch version does sound like something we would shout when hitting our thumb with a hammer:)
You’re not helping.
Sorry for my cryptic comment above – done in haste on the move.
Suncat is the serials union catalogue for the UK, with serials records from all major UK research libraries (and NIMR!).
I guess though that you are not just after a source of data, but a matching algorithm too? Can’t help there.
It’s actually turning out to be reasonably doable, if tedious. Got a tech to hit google and return the first hit for each abbreviation, which is helping populate my URL list sensibly.
When I’ve made this list, I’m flogging it.
Apparently, CAS, as the administrator of the CODENs list, assigns them to just about anything that looks vaguely like a journal, even if its not chemical, and not abstracted by them… apparently… although not tried it. Still, no-one reads anything that hasn’t got ‘chem.’ in its title somewhere do they???? [JOKE!!!]
That’s interesting, because the first few I looked for weren’t there.
CAS does have a very wide coverage – I think about 13,000 serials – but it doesn’t cover everything.
CAS does have a very wide coverage – I think about 13,000 serials – but it doesn’t cover everything.
Thinking about it, CAS, in the guise of Scifinder, must have done something similar, as they have a ‘locate article’ feature, which in the journal field will figure things like BMCL, Bioorg Med Chem Lett etc to all mean Bioorganic & (or is that ‘and’?) Medicinal Chemistry Letters, for example – but not sure how well it deals with ‘common’ typos.
And no, I’m not on any sort of commision with CAS – its just that they happen to be the ones I interact with most!
Yes, others must have had to solve this problem at some point (so for example ISI and Scopus had to deal with NSB and NSMB at some point). If you asked it to search the first several characters in the names (e.g. Nature Struc*), you would have found both names and a bunch of any tyops, that might hone things down a bit?
My comment’s useless, so here’s a URL to help: http://www.nature.com/nsmb
Life is Pain
Life = Pain
Time = Life
ergo…
I always thought Gut was a good journal name.
It would be fun to cite Pain in Gut.
I’m oddly proud of my solitary paper in Gut – mainly because Gut is the only scientific journal I’ve ever seen feature as “guest publication” (for the missing words in headlines round) in the TV show Have I Got News For You.
PS “Neuorn” sounds like a kind of being in one of Tolkien’s books to me. Just thought I’d say that before Henry did.
Austin, your paper looks from the abstract like it might involve the release of calcium from intracellular stores…?
or Pain in Blood
Richard, Pain is here
Thanks guys. Only another 2392 to go.
Actually, the NPG pages were really helpful, I didn’t realize so many journals were theirs, and they have a lovely page of them all.
Sarbjit, good plan, but when you have thinks like
Mol Cell and
Mol Cells
(two different journals)
J Mol Biol and
J Mol Biol or even
J MoL Biol
(same journal, two misspellings)
then it gets tricky.
Cath: yep, among other things. I am a calcium signaler / microscopy geek by scientific trade
Re. journal names, you can’t get it confused with anything, but after sitting through another 40 minute seminar of myriad incomprehensible abbreviations, slides of unlabelled 20-lane Western blots, or handle-turning mutagenesis of every residue in a protein, I often think it is no accident that there is a journal whose abbreviation is Anal Biochem…
Don’t forget Biochemistry and Biochemistry. Yes, you can find two journals with the exact same title, so then we add on the place of publication to disambiguate, so the second of those becomes Biochemistry (Moscow). Strictly speaking the first one should be Biochemistry (Washington) but we tend to omit the qualifier for the more familiar title.
Yah, I have a couple like that, too.
My favorite site for figuring this stuff out is Genamics JournalSeek at http://journalseek.net/index.htm Most of the abbreviations are in there.
I too am searching for the translation to Meded.Rijksuniv.Gent Fak.Landbouwkd.Toegep.Biol.Wet. What a pain