Tag Archive for 'bioinformatics'

Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

#!/bin/sh
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it :)

Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2

Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the official wwwblast documentation at NCBI. These instructions are for Ubuntu 8.04, but probably will work with many recent flavours of Debian.
Continue reading ‘Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2′

texshade: useful, and still kickin’

I’ve been looking at doing an analysis with some protein subfamily sequence logos, using Eric Beitz’s texshade. While it’s a little strange that it does the actual analysis part (rather than just the rendering) using LaTeX, it’s the only implementation of the method I know of, and it beats reimplementing it from the paper.

Although it was published in 2006 (and earlier in 2000), with the original URLs now dead, I noticed the latest update for the version of texshade in CTAN (v1.18) was on 15th of April, 2008 … ie texshade was updated just 14 days ago !

It happens all to often that published bioinformatics tools cease to be updated or even disappear from the Web not long after the peer-review publication is released. Kudos to Eric for not abandoning his software.

Announcing ResolveRef on Google App Engine

About two weeks ago, tipped off by Neil, I heard about Google App Engine. I managed to get a beta account, and I’ve finally had a chance to do something (hopefully) useful with it.

In the absence of any quickly achievable ideas for a bioinformatics app, I ported over the OpenRef application I wrote on top of TurboGears a few months back.

Just like the original, the new app, ResolveRef, is essentially a RESTful way of doing PubMed queries.
Continue reading ‘Announcing ResolveRef on Google App Engine’