A Greasemonkey script: Library Ezproxy Forwarder

Update: I’m typically using Google Chrome or Chromium these days, so it’s unlikely I’ll update this script in the future. For a similar extension for Chrome, try the EZProxy Redirect extension.

Many University libraries use some server software called Ezproxy to do authentication and arbitrate access to full-text online journal subscriptions. Essentially, Ezproxy uses some URL mangling, rewriting all hyperlinks, to pass traffic via the proxy (rather than using a conventional browser proxy setting). For example, http://www.sciencemag.org/cgi/content/full/313/5785/314 is changed to http://www.sciencemag.org.ezproxy.lib.unimelb.edu.au/cgi/content/full/313/5785/314 . If the user is not logged in to the proxy (ie has no fresh & valid cookie), a login screen is given before being forwarded to the journal site.

This plugin helps mangle URLs to add the proxy domain to outgoing links from various journal sites as well as NCBI PubMed (eg, like .ezproxy.lib.unimelb.edu.au), meaning that the user doesn’t actively have to go to their library site to follow “ezproxy-fied” links. It makes getting full-text articles via the institutional library proxy a more seamless experience (assuming that your library subscribes to the journal).

The plugin contains a list of journal and publisher sites at which it is active, plus some “special case” code for making sure only fulltext links outgoing from NCBI PubMed are handled. You can add your own journal sites as needed.

The user needs to edit the variable proxyname to make the script use their institutions EZproxy … I can’t really help you with that part, since I only know that my workplace (The University of Melbourne) uses .ezproxy.lib.unimelb.edu.au .. beyond that, you are on your own 🙂 !

I’ve uploaded the Library Ezproxy Forwarder script to Userscripts.org

Software review: producing two dimensional diagrams of membrane proteins

E. coli LamB, presented using TMRPres2D. Not that the cytoplasmic/extracellular labels are incorrect, and should say extracellular/periplasmic.

I recently needed to make a simple, two dimensional figure of a beta-barrel membrane protein. I went hunting for programs that might take a sequence and/or structure and produce a pretty looking diagram to save me constructing everything by hand. Here are two I found and tried.

Continue reading

An OpenRef implementation

Recently, Noel O’Boyle of Noel O’Blog proposed a new RESTful scheme for resolving publications, as an alternative to using DOI or PubMed ID (PMID) identifiers. Essentially, this would allow resolution of a publication like:

EL Willighagen, NM O’Boyle, H Gopalakrishnan, D Jiao, R Guha, C Steinbeck and D J Wild Userscripts for the Life Sciences BMC Bioinformatics 2007, 8, 487.

Using something like this:

openref://BMC Bioinformatics/2007/8/487


http://dx.openref.org/BMC Bioinformatics/2007/8/487

Simply using the journal title, publication year, volume and first page number. Read his post for a more detailed explanation.

While I think the scheme needs a little fleshing out, the idea is nice, since as Noel highlights – the “OpenRef” URL can be derived from the typical citation style used by academics, while the DOI and the PMID cannot (although the DOI is often printed on the journal article these days, it’s generally not used in a reference list at the end of a paper). I’m sure there are lots of corner cases that could ultimately work to over-complicate this scheme and force it to lose it’s simplicity … but at the moment it remains appealing.

It dawned upon me that an OpenRef resolver would actually be pretty straightforward to write with Turbogears (or just straight CherryPy), and a bit of Biopython EUtils magic to search PubMed.

So, without further ado … here’s the essential code for my quick implementation. It requires that you have installed Turbogears and made a quickstart project with tg-admin (see the Turbogears docs on how to do this). The code below should be added to the Root class in controllers.py, in addition to the autogenerated code that tg-admin makes for you:

from turbogears import controllers, expose, flash, redirect
from model import *

# from openref import model
from Bio import EUtils
from Bio.EUtils import DBIdsClient

from xml.dom import minidom
import urllib

class Root(controllers.RootController):

  # we use *args and **kw here to accept a variable number of
  # arguments and keyword arguments
  # (eg Journal/Year/Page or Journal/Year/Volume/Page)
  # turbogears passes arguments to the function from the URL like
  # http://webapp:8080/arg1/arg2/arg3?keyword=stuff&keyword2=morestuff
  def openref(self, journal, *args, **kw):
      # deals with openref://Journal/Year/Page
      # (no volume argument)
      if len(args) == 2:
          year, page = args
          query = ‘"%s"[TA] AND "%s"[DP] AND "%s"[PG]’ % \
                    (journal, year, page)
      # deal with openref://Journal/Year/Volume/Page
      # (including volume number)
      if len(args) == 3:
          year, volume, page = args
          query = ‘"%s"[TA] AND "%s"[DP] AND "%s"[VI] AND "%s"[PG]’ % \
                    (journal, year, volume, page)
      # search NCBI PubMed with EUtils
      client = DBIdsClient.DBIdsClient()
      result = client.search(query, retmax = 1)
      res = result[0].efetch(retmode = "xml", rettype = "xml").read()
      # get doi link from eutils XML result, example:
      #    S0022-2836(07)01626-9
      #    10.1016/j.jmb.2007.12.021
      #    18187149
      xml_doc = minidom.parseString(res)
      for tag in xml_doc.getElementsByTagName("ArticleId"):
          if tag.getAttribute("IdType") == "doi":
              doi = tag.childNodes[0].data
          if tag.getAttribute("IdType") == "pubmed":
              pmid = tag.childNodes[0].data
      # make the DOI resolution URL
      doi_url = urllib.basejoin("http://dx.doi.org/", doi)
      # make the Entrez Pubmed resolution URL
      pubmed_url =  "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?\
% (pmid)
      # and lets not forget a URL to HubMed
      hubmed_url = "http://www.hubmed.org/display.cgi?uids=%s" % (pmid)
      # decide where to redirect to based on "?redirect=xxx" argument
      if kw.has_key("redirect"):
          if kw[‘redirect’] == "doi":
              url = doi_url
          elif kw[‘redirect’] == "pubmed":
              url = pubmed_url
          elif kw[‘redirect’] == "hubmed":
              url = hubmed_url
              url = doi_url
      raise redirect(url)


Since this is seat-of-the-pants Friday arvo coding, there is very little in the way of error handling or exceptions in the above code. I might add some niceties like that later. If the Pubmed query constructed from the URL gives no PubMed hit(s), or the PubMed results doesn’t contain a DOI, you’ll get some ugly and inelegant errors.

Assuming that you run this Turbogears app locally on the default port 8080, you should be able to get redirected to the Willighagen et al Userscripts paper by going to:

http://localhost:8080/openref/BMC Bioinformatics/2007/8/487

(Firefox will properly escape the space character in the URL .. I’m not sure what other browsers may do).

By default you will be redirected to wherever dx.doi.org decides to send you (which is often the journal article at the publishers site, but there is no rule that says this must be the case), but you can also choose to be redirected to PubMed or Hubmed using:

http://localhost:8080/openref/BMC Bioinformatics/2007/8/487?redirect=pubmed
http://localhost:8080/openref/BMC Bioinformatics/2007/8/487?redirect=hubmed

I’ve got a working example running at http://openref.pansapiens.com/ if anyone would like to try it out (eg, try http://openref.pansapiens.com/openref/BMC Bioinformatics/2007/8/487 ). No promises that it will stay up for long (Turbogears apps seem to die quite a lot on my cheap little virtual hosting account … I’m using supervisor2 now, which may help keep things more available).

It should be stressed that this as is only a quick and dirty hack to demonstrate the proof of concept. It’s really only translating the ‘paths’ in the URLs provided by the user into PubMed queries, and uses the existing DOI infrastructure to ultimately redirect the user to the article; in reality I’d expect that an “OpenRef” resolver would have to be more independent and sophisticated than this. I can’t imagine who would maintain a separate OpenRef database in order to make it independent of DOIs and PubMed.

Unfortunately the domain openref.org has already been registered .. and not by Noel. Maybe it’s already time for a new name for this fledgling resolution scheme 🙂 ??

Changing "Illustration" to "Figure" in OpenOffice Writer

I’ve decided to try and use OpenOffice Writer properly .. like take advantage of some of its more powerful features rather than just using it as a text editor with formatting.

For drafting manuscripts of scientific papers, pictures/photos/illustrations etc are usually referred to as “Figures”, however when inserting a picture via “Insert -> Picture -> From File ..” the default behavior of OpenOffice is to use the caption “Illustration”. This will not do.

From the OpenOffice Writer Guide, Chapter 8 [pdf], here is how to get it to use “Figure” by default:

• Open the “Tools -> Options –> OpenOffice.org Writer—> AutoCaption” dialog box.

• Under “Add captions automatically when inserting section“, check
OpenOffice.org Writer Picture, and make sure its checkbox is ticked.

• Under the Category drop-down list, enter the name that you want added,
eg, Figure, in the place by overwriting any sequence name in the list (it will probably have “Illustration”, before you overwrite it.) I also like my Figure label bold, so I also selected “Strong Emphasis” from the “Character Style” dropdown box. Press OK to save the changes.

Now you can insert a Picture using “Insert -> Picture -> From File ..” and the label should be “Figure”, not “Illustration”. The picture comes in its own frame, and you can edit the figure legend directly in the document.

Hmmm … Latex is not looking so bad again ….