Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

#!/bin/sh
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it 🙂

5 thoughts on “Count the number of sequences in a FASTA format file: a Unix shell snippet

  1. Yeh, good point. “grep -c” is even POSIX compliant, so I guess that would work with various versions of grep. This page makes fun of my way, and even shows why your way is usually practically superior.

    Back when I wrote it, I think I did it with the pipe because I was in the habit of piping various outputs into “wc” to count the lines, so it seemed a a bit more general than using a grep specific feature.

  2. The following line should be faster (usefull if you have to do it thousands of times), since it only looks at the beginnings of lines
    grep -c ‘^>’ myFasta.fasta

Leave a Reply

Your email address will not be published. Required fields are marked *