Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.
It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l
Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.
(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).
Couldn’t help myself … everyone else is doing it
