Stack Exchange sites for science

Recently I’ve noticed the emergence of several Stack Overflow-style sites for science-related questions and answers. For those unfamiliar with Stack Overflow – it’s a question and answer ‘forum’ for computer programmers that keeps the signal-to-noise ratio very high through a carefully refined reputation system. Late last year the creators of Stack Overflow launched a hosted service called Stack Exchange, which allows anyone to start their own “Stack Overflow” based around any topic.

2363694735_507a4eea3b_o

http://www.flickr.com/photos/alicebartlett/ / CC BY-NC 2.0

The service is was a little pricey ($129+/month), and I suspect this is one reason why a few open source clones inspired by Stack Overflow also exist. Since then, Stack Exchange sites (or clones) have proliferated – and those working as scientists (or those interested in science) haven’t been neglected. Here are my favorites:
  • MajorGroove.org pitches itself as a ‘forum for biologists’, which it is, however most of the content currently focuses on X-ray crystallography and associated techniques. It is currently in ‘bootstrap mode’, which means that reputation requirements are a little less strict until the userbase and site activity has grown to a critical size. Is there even a need for a Stack Exchange forum for biological crystallography ? Macromolecular crystallography already has a single, central, de facto standard forum – the CCP4BB mailing list. While it may be antiquated by Web2.0 standards, CCP4BB works well for a lot of people, and there is a huge amount of useful and important information buried in it’s archives. For many crystallographers, it seems CCP4BB would only be extracted from their “cold dead hands”. Despite this, I think the Stack Overflow format will be very beneficial for people new to the field.  As a side note – I discovered MajorGroove via Graeme Winters XIA2 blog right around the time when I was considering kickstarting a “Stack Overflow for crystallography”. At the moment it seems that a small userbase of crystallographers is already established on MajorGroove and there would be no purpose for another near identical forum. Even if questions about other techniques in the biosciences start to dilute out the structural biology, one click on the ‘crystallography‘ tag or the ‘ccp4‘ tag, and you can get straight to the good stuff. (In fact this feature was deemed useful enough by Google that they decided to bless the ‘android‘ tag on Stack Overflow as the official Android Q&A forum).
  • NMRWiki Q&A (http://qa.nmrwiki.org/) is a StackExchange-clone for magnetic resonances, mostly focused on NMR, but also open to EPR/ESR and MRI users. It’s not actually running on the StackExchange platform, but uses the open source OSQA / CNPROG clone, built on top of Django. As far as I know, there is no “CCP4BB for NMR”, which makes the NMRWiki Q&A site potentially even more valuable to structural biologists than it’s crystallography centric cousin, MajorGroove. Back when I was doing my PhD using protein NMR spectroscopy as my primary technique, there were very few good resources like this online – I do less NMR these days, but you can bet that I’ll be using the NMRWiki Q&A and it’s associated wiki to refresh my memory and catch up on need methodological developments in the future.
  • BioStar (http://biostar.stackexchange.com/), a StackExchange for bioinformatics, computational genomics and systems biology questions and answers. This one is busier and better established than the above mentioned forums, probably by virtue of the fact the bioinformaticians spend more time in front of the computer than your average molecular biologist or structural biologist.
  • And, for a bit of fun: Skeptic Exchange (http://exchange.bristolskeptics.co.uk/), which covers rational questions and answers to various topics including pseudoscience, faith healing, the supernatural and alternative medicine.

Want more ? There are a bunch of science related StackExchanges listed under “Science” here: http://meta.stackexchange.com/questions/4/list-of-stackexchange-sites .. and digging back through the FriendFeed archives I see Matt Todd initiated a concise listing (which if I’d seen, I probably never would have started this post).

And now, the latest* news Stack Exchange 2.0 will be ‘free‘. It looks like they are trying to structure the new Stack Exchange ecosystem a bit like the Usenet hierarchy (comp.*, rec.* etc), with a fairly involved discussion, proposal and acceptance process for new sites – it’s unclear yet whether this approach is going to work out better than just open sourcing the whole shebang, but time will tell. My guess is that BioStar, MajorGroove and probably even an incarnation of NMRWiki Q&A will eventually become part of this formalized ecosystem.

On one hand making StackExchange sites free to run is great – it lowers the barrier to entry to allow many more sites to emerge and operate. On the other hand, as we have seen with the acquisition of FriendFeed by Facebook, not having a clear revenue stream can ultimately leave communities  (such as The Life Scientists) without any certainty in a sites future, potentially impacting growth and participation. Personally I’m much more inclined to invest time in a site if it is something like Wikipedia, where I know my contributions are very likely to live on, in some form, for decades (centuries ?) to come. Ideally the archives of these new Stack Exchange sites could become useful online resources for decades to come – but with a single company at the helm and a “Web 2.0 business model”, continued operation for even a decade seems unlikely. The one saving grace: all content on the new Stack Exchange sites will be licensed under a Creative Commons license – so if Stack Exchange itself is acquired and shut down, we will always be able to preemptively leech the archives and provide them online elsewhere. Maybe it’s strange that I’m already thinking about archiving the new Stack Exchange upon it’s demise before it’s even begun, but I think it’s important to take the long term view with our data and recorded wisdom. Unlike when in 1994 when GeoCities (R.I.P) was started, teh Internets is no longer a fad – the hard disks connected to it are fast becoming the sum of all accessible human knowledge, so we’d better make sure we can retain the good bits for a little longer than 10 years.

* – as all too common these days .. I’m a little behind the curve on this one. I meant to finish this post a month ago, but with a busy time pre-holiday, then the actual holiday, a month has gone by.

Searching bioinformatic databases with YubNub

You may already be familiar with YubNub; it describes itself as “the social command line for the web”. Most commands consist of two (or more) words … one for the search engine, the other for the query.

For example, typing:

gg open science on friendfeed

into the YubNub search box searches Google for “open science on friendfeed“, via YubNub.

I thought I’d highlight a few life science- and bioinformatics-related YubNub commands I find myself using quite often in my day-to-day work. Some are commands I created, others someone else created. This is the beauty of YubNub … often someone has already made the ‘obvious’ command … it’s worth just trying to search with a command you expect to exist, since it often does.

Onward, with the list:

Continue reading

Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

#!/bin/sh
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it 🙂

Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2

Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the official wwwblast documentation at NCBI. These instructions are for Ubuntu 8.04, but probably will work with many recent flavours of Debian.
Continue reading

texshade: useful, and still kickin’

I’ve been looking at doing an analysis with some protein subfamily sequence logos, using Eric Beitz’s texshade. While it’s a little strange that it does the actual analysis part (rather than just the rendering) using LaTeX, it’s the only implementation of the method I know of, and it beats reimplementing it from the paper.

Although it was published in 2006 (and earlier in 2000), with the original URLs now dead, I noticed the latest update for the version of texshade in CTAN (v1.18) was on 15th of April, 2008 … ie texshade was updated just 14 days ago !

It happens all to often that published bioinformatics tools cease to be updated or even disappear from the Web not long after the peer-review publication is released. Kudos to Eric for not abandoning his software.