Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it 🙂

Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2

Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the official wwwblast documentation at NCBI. These instructions are for Ubuntu 8.04, but probably will work with many recent flavours of Debian.
Continue reading