ResolveRef updated : now with auto-suggest and source code

I updated ResolveRef last night and checked in the most current sourcecode to svn at Google Code.

New features include:

ResolveRef, now prettier, with comments box by disqus.

  • Suggest/autocomplete for journal title field, using the journal title lists provided by PubMed.
  • A “Verify” button. Allows a ResolveRef URL to be constructed with the web form and verified as working and valid without actually forwarding the user to the article.
  • Some bugfixes (handled the case where there is no DOI in the PubMed record, handled network timeouts to PubMed)
  • Refreshed visuals
  • Disqus comments box for feedback

In the interest of just getting something working quickly, I implemented the suggest feature in the laziest, possibly most RAM and CPU hungry way possible (the “JQuery Suggest” code queries the web app with substrings as you type each character. At the server side, the app uses a regex to scan a ~1.5 Mb list of journal titles held in RAM). I’ve already noticed a few “This request used a high amount of CPU” warnings in the logs, with the threat “High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled“. If my nasty hack starts heating up Google’s datacentre too much, I might have to disable the ‘suggest’ feature until I can implement it “properly”.

Reflections, discoveries

This idea of implementing Openref-style article identifiers has been an fun experiment, and a nice way to learn more about the ins-and-outs of PubMed. When working on implementing the ‘suggest’ feature, a major drawback became even more apparent … journal titles (the [TA] field) used by PubMed are not always easily guessable, and many common abbreviations used in reference lists do not appear to exist in PubMed’s downloadable flat-file journal title lists. This is the list that ResolveRef uses to make the ‘suggestions’, so having ‘missing’ journal titles presents a problem if I want users to be able to painlessly construct ResolveRef URLs.

Proc. Natl. Acad. Sci. U.S.A. is a perfect example. Many article bibliographies use PNAS – that would be my guess if I were trying to create a ResolveRef URL for a PNAS paper – and yet this journal title does not exist as far as PubMed’s official journal list is concerned. Issues surrounding this problem were discussed on Noel’s original OpenRef post. The odd thing, is that if I search the PubMed Journals database, for “PNAS”, it finds it, and gives me a record where PNAS is listed under “Other titles(s)”. If someone could point me to where I can get these extra fields containing additional names for a journal that are not provided in the the downloadable flat-files, it would be much appreciated (I bet Alf knows the answer. Or maybe I should email the folks at PubMed). If I can get a better list of titles the ‘suggest’ feature in ResolveRef would suddenly become a whole lot more useful. Another way around this may be to use CrossRef, and I’m looking into that, but I get the feeling that usage of the CrossRef API is more restricted, so I haven’t bothered with it so far.

Thoughts about the future of ResolveRef / OpenRef

At this stage, ResolveRef URLs are not actually identifiers. They simply act like a frontend to a single-hit PubMed search, and several different ResolveRef URLs can return the same DOI URL (and hence the same journal article). A proper identifier would have a one-to-one mapping between the human-readable ResolveRef URLs and a DOI. In the future, I may attempt to get ResolveRef to ‘normalize’ URLs by allowing only a single journal title for each journal and forcing the use of volume numbers if present. The user could use the web interface to enter the values, and ResolveRef will return a normalized URL. Only normalized URLs would successfully forward to the DOI URL, others will return an error with “Did you mean ..insert normalized URL ..?“. One drawback is that this would reduce the guessablity of ResolveRef URLs, but the advantage is that they could be treated like identifiers: one article would have one and only one valid ResolveRef URL. By requiring a tool (like the ResolveRef web form) to help users build a vaild URL, and removing some of the guessability, ResolveRef would move a little closer to a reinvention of OpenURL (although I think OpenRef/ResolveRef URLs are still more readable and cleaner than OpenURLs, and are much more guessable if you have a bibliography in front of you).

A key cosmetic (and philosophical) difference between OpenURL and OpenRef/ResolveRef URLs is that OpenURL uses HTTP GET fields, eg ?title=bla&issn=12345, while OpenRef/ResolveRef uses the URL path itself eg, somejournalname/2008/4/1996. It’s a bit like one scheme was designed in the age of CGI scripts, while the other was designed for web applications capable of more RESTful behaviour. In my mind OpenURL is more versatile but much uglier, while OpenRef is cleaner and simpler but can only reference journal articles. OpenRef-style URLs will never be able to reference the breadth of resources that an OpenURL can theoretically handle. Maybe hybrid solution could work … some kind of OpenURL server that could “speak OpenRef” … accepting OpenRef-style URLs where possible, while still dealing with regular OpenURL style “?bla=blarg&” query strings for everything else.

As far as I can tell OpenURLs are not identifiers with a one-to-one URL-to-article mapping – this is a drawback since you could not do a Google search to reliably find sites that reference an article via it’s OpenURL … you theoretically could do this with a normalized OpenRef/ResolveRef URL, since there will only be one unique string used to reference any one article (as Noel pointed out, OpenRef strings have some properites akin to InChi strings). Obviously to do this cleanly, ResolveRef would need a nicer domain (something akin to dx.doi.org).

Anyhow, I’m not expecting ResolveRef / OpenRef to make any impact on anything anywhere anytime soon. I’m not a librarian, I don’t sit on an NISO/ANSI committee, and I don’t see publishers seeing a need to adopt anything beyond the DOI. But it’s been nice to play around with, and I’m likely to continue doing so.

2 thoughts on “ResolveRef updated : now with auto-suggest and source code

  1. Thanks Alf … I knew you would have a solution if anyone did.

    Thinking about it more, I realize that it’s not such a huge problem … while in conversation scientists often refer to “PNAS”, “JBC”, and “JMB” (PubMed gets JMB ‘wrong’, and won’t understand JBC at all), but if I actually look at reference lists in journals, those abbreviations are never used. It is always “J. Biol. Chem.”, “J. Mol. Biol.” and “Proc. Natl. Acad. Sci. U.S.A.”. Not sure where I got it into my head that “JMB” was actually used in any real reference list.

    The key “user story” I envisage for ResolveRef is something like:
    “Dr X. has a printed journal article (or pdf) and quickly wants to link to an article in the reference list, then post that link to her blog. She quickly types the key details (journal/year/volume/page) directly into her blog (with the ResolveRef domain in front), presses “Preview” and tests the link. It forwards to the article as expected. Dr X. saves making a trip to HubMed (or maybe PubMed, or GoPubMed), and keeps working on her blog post”. So as long as the user has the reference in front of them to copy, guessing at abbreviations like “JMB” shouldn’t be a big issue.

Leave a Reply

Your email address will not be published. Required fields are marked *