One of the nice features of Google App Engine is you can easily view logs for your application to quickly see requests generating errors. Browsing the logs of ResolveRef, I’ve been able to identify an few classes of query which for one reason or another, weren’t working.
I’ve been looking at doing an analysis with some protein subfamily sequence logos, using Eric Beitz’s texshade. While it’s a little strange that it does the actual analysis part (rather than just the rendering) using LaTeX, it’s the only implementation of the method I know of, and it beats reimplementing it from the paper.
Although it was published in 2006 (and earlier in 2000), with the original URLs now dead, I noticed the latest update for the version of texshade in CTAN (v1.18) was on 15th of April, 2008 … ie texshade was updated just 14 days ago !
It happens all to often that published bioinformatics tools cease to be updated or even disappear from the Web not long after the peer-review publication is released. Kudos to Eric for not abandoning his software.
About two weeks ago, tipped off by Neil, I heard about Google App Engine. I managed to get a beta account, and I’ve finally had a chance to do something (hopefully) useful with it.
In the absence of any quickly achievable ideas for a bioinformatics app, I ported over the OpenRef application I wrote on top of TurboGears a few months back.
The Google Summer of Code project participants have been selected for 2008. I scanned the list to see how projects specifically aimed at the biosciences and bioinformatics fared:
- GenMAPP (Gene Map Annotator and Pathway Profiler), a tool for visualizing gene expression data on top of graphical representations of biological pathways.
- The NESCent (National Evolutionary Synthesis Centre) Phyloinformatics project, has range of potential projects to do with phylogenetic analysis, covering things like phyloXML integration with BioPerl and BioRuby, phyloinformatics web services and tree analysis using the MapReduce algorithm (with Hadoop).
- OMII-UK, which covers a range of tools including the Taverna Workbench for workflow design and execution.
- Also participating is OpenMRS, a medical record system aimed at developing countries.
There are also at least two platforms for cluster, parallel or grid computing on the list; I spotted the Globus Toolkit and OAR, but there are probably a few more in that that broad category (eg, OMII-UK oversees a bunch of Grid related projects too).
It’s worth noting that I’ve ignored a bunch of really important pieces of software that are less field-specific, but are actually lower level components of the platforms critical for most large bioinformatics projects. Things like Python, Perl, R, various Open Source databases, and collaboration tools like wikis (MoinMoin) and CMSs (eg Drupal) are also participating.
I don’t think coding for bioinformatics applications is as attractive to students as working on some of the other “sexier” projects available (eg the SecondLife client, or the Apache Webserver), but kudos to Google for letting a few bioinformatics tools into the fray. Hopefully the students who hack on them learn something, and hone their coding skills (you never know, they may even help improve these tools too 🙂 ).
Recently, Noel O’Boyle of Noel O’Blog proposed a new RESTful scheme for resolving publications, as an alternative to using DOI or PubMed ID (PMID) identifiers. Essentially, this would allow resolution of a publication like:
EL Willighagen, NM O’Boyle, H Gopalakrishnan, D Jiao, R Guha, C Steinbeck and D J Wild Userscripts for the Life Sciences BMC Bioinformatics 2007, 8, 487.
Using something like this:
Simply using the journal title, publication year, volume and first page number. Read his post for a more detailed explanation.
While I think the scheme needs a little fleshing out, the idea is nice, since as Noel highlights – the “OpenRef” URL can be derived from the typical citation style used by academics, while the DOI and the PMID cannot (although the DOI is often printed on the journal article these days, it’s generally not used in a reference list at the end of a paper). I’m sure there are lots of corner cases that could ultimately work to over-complicate this scheme and force it to lose it’s simplicity … but at the moment it remains appealing.
So, without further ado … here’s the essential code for my quick implementation. It requires that you have installed Turbogears and made a quickstart project with tg-admin (see the Turbogears docs on how to do this). The code below should be added to the Root class in controllers.py, in addition to the autogenerated code that tg-admin makes for you:
from turbogears import controllers, expose, flash, redirect
from model import *
# from openref import model
from Bio import EUtils
from Bio.EUtils import DBIdsClient
from xml.dom import minidom
# we use *args and **kw here to accept a variable number of
# arguments and keyword arguments
# (eg Journal/Year/Page or Journal/Year/Volume/Page)
# turbogears passes arguments to the function from the URL like
def openref(self, journal, *args, **kw):
# deals with openref://Journal/Year/Page
# (no volume argument)
if len(args) == 2:
year, page = args
query = ‘"%s"[TA] AND "%s"[DP] AND "%s"[PG]’ % \
(journal, year, page)
# deal with openref://Journal/Year/Volume/Page
# (including volume number)
if len(args) == 3:
year, volume, page = args
query = ‘"%s"[TA] AND "%s"[DP] AND "%s"[VI] AND "%s"[PG]’ % \
(journal, year, volume, page)
# search NCBI PubMed with EUtils
client = DBIdsClient.DBIdsClient()
result = client.search(query, retmax = 1)
res = result.efetch(retmode = "xml", rettype = "xml").read()
# get doi link from eutils XML result, example:
xml_doc = minidom.parseString(res)
for tag in xml_doc.getElementsByTagName("ArticleId"):
if tag.getAttribute("IdType") == "doi":
doi = tag.childNodes.data
if tag.getAttribute("IdType") == "pubmed":
pmid = tag.childNodes.data
# make the DOI resolution URL
doi_url = urllib.basejoin("http://dx.doi.org/", doi)
# make the Entrez Pubmed resolution URL
pubmed_url = "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?\
list_uids=%s&dopt=Abstract" % (pmid)
# and lets not forget a URL to HubMed
hubmed_url = "http://www.hubmed.org/display.cgi?uids=%s" % (pmid)
# decide where to redirect to based on "?redirect=xxx" argument
if kw[‘redirect’] == "doi":
url = doi_url
elif kw[‘redirect’] == "pubmed":
url = pubmed_url
elif kw[‘redirect’] == "hubmed":
url = hubmed_url
url = doi_url
Since this is seat-of-the-pants Friday arvo coding, there is very little in the way of error handling or exceptions in the above code. I might add some niceties like that later. If the Pubmed query constructed from the URL gives no PubMed hit(s), or the PubMed results doesn’t contain a DOI, you’ll get some ugly and inelegant errors.
Assuming that you run this Turbogears app locally on the default port 8080, you should be able to get redirected to the Willighagen et al Userscripts paper by going to:
(Firefox will properly escape the space character in the URL .. I’m not sure what other browsers may do).
By default you will be redirected to wherever dx.doi.org decides to send you (which is often the journal article at the publishers site, but there is no rule that says this must be the case), but you can also choose to be redirected to PubMed or Hubmed using:
I’ve got a working example running at http://openref.pansapiens.com/ if anyone would like to try it out (eg, try http://openref.pansapiens.com/openref/BMC Bioinformatics/2007/8/487 ). No promises that it will stay up for long (Turbogears apps seem to die quite a lot on my cheap little virtual hosting account … I’m using supervisor2 now, which may help keep things more available).
It should be stressed that this as is only a quick and dirty hack to demonstrate the proof of concept. It’s really only translating the ‘paths’ in the URLs provided by the user into PubMed queries, and uses the existing DOI infrastructure to ultimately redirect the user to the article; in reality I’d expect that an “OpenRef” resolver would have to be more independent and sophisticated than this. I can’t imagine who would maintain a separate OpenRef database in order to make it independent of DOIs and PubMed.
Unfortunately the domain openref.org has already been registered .. and not by Noel. Maybe it’s already time for a new name for this fledgling resolution scheme 🙂 ??