Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

#!/bin/sh
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it :)

The Count the number of sequences in a FASTA format file: a Unix shell snippet by Andrew Perry, unless otherwise expressly stated, is licensed under a Creative Commons CC0 1.0 Universal License.

5 thoughts on “Count the number of sequences in a FASTA format file: a Unix shell snippet

  1. Yeh, good point. “grep -c” is even POSIX compliant, so I guess that would work with various versions of grep. This page makes fun of my way, and even shows why your way is usually practically superior.

    Back when I wrote it, I think I did it with the pipe because I was in the habit of piping various outputs into “wc” to count the lines, so it seemed a a bit more general than using a grep specific feature.

  2. The following line should be faster (usefull if you have to do it thousands of times), since it only looks at the beginnings of lines
    grep -c ‘^>’ myFasta.fasta

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>