Running a local JABAWS server for Jalview on Ubuntu (11.04 Natty)

The excellent Jalview sequence alignment visualization and editing tool has the ability to send a set of sequences to a multiple sequence alignment web service (“JABAWS”) and receive the results in a new alignment window. This is really convenient when you are doing lots of sequence analysis, and Geoff Barton’s group at the University of Dundee provide a JABAWS server that Jalview will use by default.

But maybe the Dundee server is down. Or maybe you think your local machine will do things faster. Or maybe you work on über secret sequences in some Faraday cage bunker with no permanent network connection. In each of these cases, you may want to run your own local JABAWS server and use that instead. In this case, read on.

Continue reading

Stack Exchange sites for science

Recently I’ve noticed the emergence of several Stack Overflow-style sites for science-related questions and answers. For those unfamiliar with Stack Overflow – it’s a question and answer ‘forum’ for computer programmers that keeps the signal-to-noise ratio very high through a carefully refined reputation system. Late last year the creators of Stack Overflow launched a hosted service called Stack Exchange, which allows anyone to start their own “Stack Overflow” based around any topic.

2363694735_507a4eea3b_o / CC BY-NC 2.0

The service is was a little pricey ($129+/month), and I suspect this is one reason why a few open source clones inspired by Stack Overflow also exist. Since then, Stack Exchange sites (or clones) have proliferated – and those working as scientists (or those interested in science) haven’t been neglected. Here are my favorites:
  • pitches itself as a ‘forum for biologists’, which it is, however most of the content currently focuses on X-ray crystallography and associated techniques. It is currently in ‘bootstrap mode’, which means that reputation requirements are a little less strict until the userbase and site activity has grown to a critical size. Is there even a need for a Stack Exchange forum for biological crystallography ? Macromolecular crystallography already has a single, central, de facto standard forum – the CCP4BB mailing list. While it may be antiquated by Web2.0 standards, CCP4BB works well for a lot of people, and there is a huge amount of useful and important information buried in it’s archives. For many crystallographers, it seems CCP4BB would only be extracted from their “cold dead hands”. Despite this, I think the Stack Overflow format will be very beneficial for people new to the field.  As a side note – I discovered MajorGroove via Graeme Winters XIA2 blog right around the time when I was considering kickstarting a “Stack Overflow for crystallography”. At the moment it seems that a small userbase of crystallographers is already established on MajorGroove and there would be no purpose for another near identical forum. Even if questions about other techniques in the biosciences start to dilute out the structural biology, one click on the ‘crystallography‘ tag or the ‘ccp4‘ tag, and you can get straight to the good stuff. (In fact this feature was deemed useful enough by Google that they decided to bless the ‘android‘ tag on Stack Overflow as the official Android Q&A forum).
  • NMRWiki Q&A ( is a StackExchange-clone for magnetic resonances, mostly focused on NMR, but also open to EPR/ESR and MRI users. It’s not actually running on the StackExchange platform, but uses the open source OSQA / CNPROG clone, built on top of Django. As far as I know, there is no “CCP4BB for NMR”, which makes the NMRWiki Q&A site potentially even more valuable to structural biologists than it’s crystallography centric cousin, MajorGroove. Back when I was doing my PhD using protein NMR spectroscopy as my primary technique, there were very few good resources like this online – I do less NMR these days, but you can bet that I’ll be using the NMRWiki Q&A and it’s associated wiki to refresh my memory and catch up on need methodological developments in the future.
  • BioStar (, a StackExchange for bioinformatics, computational genomics and systems biology questions and answers. This one is busier and better established than the above mentioned forums, probably by virtue of the fact the bioinformaticians spend more time in front of the computer than your average molecular biologist or structural biologist.
  • And, for a bit of fun: Skeptic Exchange (, which covers rational questions and answers to various topics including pseudoscience, faith healing, the supernatural and alternative medicine.

Want more ? There are a bunch of science related StackExchanges listed under “Science” here: .. and digging back through the FriendFeed archives I see Matt Todd initiated a concise listing (which if I’d seen, I probably never would have started this post).

And now, the latest* news Stack Exchange 2.0 will be ‘free‘. It looks like they are trying to structure the new Stack Exchange ecosystem a bit like the Usenet hierarchy (comp.*, rec.* etc), with a fairly involved discussion, proposal and acceptance process for new sites – it’s unclear yet whether this approach is going to work out better than just open sourcing the whole shebang, but time will tell. My guess is that BioStar, MajorGroove and probably even an incarnation of NMRWiki Q&A will eventually become part of this formalized ecosystem.

On one hand making StackExchange sites free to run is great – it lowers the barrier to entry to allow many more sites to emerge and operate. On the other hand, as we have seen with the acquisition of FriendFeed by Facebook, not having a clear revenue stream can ultimately leave communities  (such as The Life Scientists) without any certainty in a sites future, potentially impacting growth and participation. Personally I’m much more inclined to invest time in a site if it is something like Wikipedia, where I know my contributions are very likely to live on, in some form, for decades (centuries ?) to come. Ideally the archives of these new Stack Exchange sites could become useful online resources for decades to come – but with a single company at the helm and a “Web 2.0 business model”, continued operation for even a decade seems unlikely. The one saving grace: all content on the new Stack Exchange sites will be licensed under a Creative Commons license – so if Stack Exchange itself is acquired and shut down, we will always be able to preemptively leech the archives and provide them online elsewhere. Maybe it’s strange that I’m already thinking about archiving the new Stack Exchange upon it’s demise before it’s even begun, but I think it’s important to take the long term view with our data and recorded wisdom. Unlike when in 1994 when GeoCities (R.I.P) was started, teh Internets is no longer a fad – the hard disks connected to it are fast becoming the sum of all accessible human knowledge, so we’d better make sure we can retain the good bits for a little longer than 10 years.

* – as all too common these days .. I’m a little behind the curve on this one. I meant to finish this post a month ago, but with a busy time pre-holiday, then the actual holiday, a month has gone by.

Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it 🙂

Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2

Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the official wwwblast documentation at NCBI. These instructions are for Ubuntu 8.04, but probably will work with many recent flavours of Debian.
Continue reading

Software review: producing two dimensional diagrams of membrane proteins

E. coli LamB, presented using TMRPres2D. Not that the cytoplasmic/extracellular labels are incorrect, and should say extracellular/periplasmic.

I recently needed to make a simple, two dimensional figure of a beta-barrel membrane protein. I went hunting for programs that might take a sequence and/or structure and produce a pretty looking diagram to save me constructing everything by hand. Here are two I found and tried.

Continue reading