Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

#!/bin/sh
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it 🙂