Software review: producing two dimensional diagrams of membrane proteins

E. coli LamB, presented using TMRPres2D. Not that the cytoplasmic/extracellular labels are incorrect, and should say extracellular/periplasmic.

I recently needed to make a simple, two dimensional figure of a beta-barrel membrane protein. I went hunting for programs that might take a sequence and/or structure and produce a pretty looking diagram to save me constructing everything by hand. Here are two I found and tried.

Continue reading

FoldIt – Crowdsourcing to solve the protein folding problem

David Baker’s lab and friends, have recently released a new ‘experiment’ in protein folding called FoldIt. Essentially, individuals or teams can compete online to manually fold protein structures, guided by the internal energy function within the game (it very likely uses code from the impressive ab initio folding software Rosetta under the hood). The interface is designed as a game to make it accessible to everyone, not just experts in protein folding. While it’s pretty simplified compared with your average molecular structure editing software, I think designers of scientific software (often scientists themselves) should take note; a good clean interface can really assist getting a specific job done painlessly. I haven’t played enough with it yet, but I get the feeling that FoldIt could be a nice way to introduce some protein structure concepts to undergraduates too.

There were the usual complaints on Slashdot that FoldIt doesn’t have a Linux version. Well, I’m happy to report that it seems to run alright using Wine (on Ubuntu Hardy Heron). I couldn’t log in to try the competitive puzzles, but I suspect the server is just in the midst of a Slashdotting. I’ll try later.

FoldIt screenshot, running under Wine

From the FoldIt FAQ:

Can humans really help computers fold proteins?
We’re collecting data to find out if humans’ pattern-recognition and puzzle-solving abilities make them more efficient than existing computer programs at pattern-folding tasks. If this turns out to be true, we can then teach human strategies to computers and fold proteins faster than ever!

Not sure where I saw it, but I remember reading an argument that the future of crowdsourcing would be to not just blindly trust the whole crowd, but also identify experts in the crowd and weight their predictions more strongly. I’d say this is will be the case with ‘manual’ protein folding – just like some players become l33t at first-person-shooters (like my favorite, RTCW: Enemy Territory which depsite enjoying, I’m not so l33t at), and could beat any AI player that doesn’t cheat… some people will probably become pretty good at folding up proteins. Maybe FoldIt will identify them, and they can make their gaming skills useful, and teach their tricks to software to automate the process. Or maybe it will just remain a fun-ish puzzle game πŸ™‚

Qutemol using Cedega

Pawel over at Freelancing Science recently highlighted Qutemol, a nice looking molecular viewer that does real-time ambient occlusion rendering. There isn’t any official Linux version, but I found that the Windows version runs okay on Linux using Cedega (a version of Wine that has better DirectX support, especially for games). Since Cedega is based on the Open Source Wine code, you can compile your own command line version … but it’s a good idea to buy a maintenance subscription from Transgaming and support it’s further development, if you can afford it.

Here’s a screenshot of Qutemol running under Cedega on Ubuntu Gutsy Gibbon, just to prove it.

Qutemol running under Cedega

No, it’s not Photoshopped … (or GIMPed) … πŸ™‚

ARIA verson 2.2 released

I don’t usually post about NMR (Nuclear Magnetic Resonance) and structural biology related stuff, but I’ve always intended to. In this post I’m pulling out all the stops on specialist lingo and assumed background knowledge, so hopefully it isn’t too incomprehensible to the non-structural biology crowd :).

ARIA version 2.2 has been released in the last few weeks. ARIA is an automated NOE assignment and structure calculation package, which (in theory) takes some of the pain and slowness out of producing protein (and DNA and/or RNA) structures from Nuclear Magnetic Resonance data. I’ll say up front; I haven’t tried this version yet, but some of the improvements look exciting.

Here are two new features worth noting … followed by what I think it all means:

  • The assignment method has been improved with the introduction of a network-anchoring analysis (Herrmann et al., 2002) for filtering of the initial assignments.
  • The integration of the CCPN has been completed. The imported CCPN distance constraints lists can enter the ARIA process for calibration, violation analysis and network-anchoring analysis. The final constraint lists can be exported as well.

In the past I have done some quick and dirty tests comparing the quality of protein structures produced using Aria 2.1 vs. Peter Gunterts CYANA 1.07 and 2.1, using the exact same NMR peak input lists (with slightly noisy data containing a number of incorrectly picked peaks). CYANA always won hands down, assigning more NOE crosspeaks correctly and producing an ensemble of model structures with much lower RMSD and generally better protein structure quality scores (ie using pretty much any decent pairwise pseudo-energy potential, and Procheck). Also, ARIA produced ‘knotted’ structures which were almost certainly incorrect, while CYANA did not. Other postdocs and students in my former lab had done similar independent tests with ARIA 1.2 vs. CYANA 1.0.7, and had come to similar conclusions.

The disclaimer: It should be noted here that assessment of the quality of an ensemble of NMR structure coordinates can be problematic, and is really the topic of another long post (and probably tens if not hundreds of peer-reviewed journal articles). So saying “CYANA version X is better then ARIA version X” based on the RMSD of the final calculated ensemble is a bit unfair … in fact using RMSD of the ensemble to gauge structure quality is just plain wrong in this context. In my (unpublished, non-peer reviewed) tests, it is possible that ARIA could be producing high RMSD but essentially ‘correct’ structures, while CYANA could be producing tightly defined but ‘incorrect’ structures, but I doubt it. The gap between the output of each program was wide enough to suggest that under real-world conditions where the input peak list contained a number of ‘noise’ peaks, ARIA was failing to give a set of consistent solutions (probably due to lack of NOE assignments), while CYANA was giving a set of tightly defined structures (which may or may not have represented the ‘correct’ solution). Other evaluations (protein structure quality measures, Procheck, comparison to known structures of similar proteins) indicated that the CYANA structures were not grossly ‘incorrect’, so I’d say CYANA was just giving a better defined (ie lower ensemble RMSD) set of plausible solutions.

My gut feeling is that ARIA 2.2 will perform much better than past versions, due to one key feature that has been ‘borrowed’ from CYANA; the introduction of a network-anchoring analysis. In a nutshell, network-anchoring scores essentially weight distance constraints (or NOE assignments) based on how ‘connected’ that constraint is within the graph formed by other constraints. This means that in effect a single, isolated constraint pulling two residues on opposite sides of a protein together is down-weighted, while if multiple constraints link those residues (or their neighboring residues) then those constraints are considered more trusted and hence weighted heavier. For better or worse (usually better), this score simulates what the human NMR spectroscopist would do when assigning NOE crosspeaks manually … usually two residues in contact will show multiple NOE crosspeaks connecting them and involve multiple different nuclei, however a single lonely NOE between two nuclei which are distant from eachother in the primary protein sequence is heavily scrutinized and regarded with suspicion since it is likely to be mis-assigned. I’m very keen to test ARIA 2.2 on my old data set and see if I’m actually right (I may be able to try it with network anchoring turned on, and off, and see just what sort of contribution that score is making).

Another completed feature, the integration between ARIA and the CCPN libraries/analysis package should also be a big plus. I haven’t used the CCPN analysis software yet, but a few years ago I wrote some code to help make CYANA and the Sparky NMR assignment program work together better. The result was functional, but very hackish (and I’m probably the only person in the world who understands how it was intended to be used, since I still haven’t got around to writing any documentation. Naughty, naughty). CCPN + ARIA may turn out to be the better option for spectral analysis and structure calculation in the future, as opposed to my currently preferred Sparky + CYANA combination.

I’m really itching to find a good reason to do an NMR structure project now … back to work !!