Count the number of sequences in a FASTA format file: a Unix shell snippet

Sometimes it’s nice to quickly check how many sequences are in a FASTA format sequence file.

It barely warrants it’s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA “flat-file database”, based on the presence of the “>” header symbol.

#!/bin/sh
# ~/bin/countseqs
# Counts the number of sequences in a FASTA format file
grep ">" $1 | wc -l

Dead easy huh ? I put this in ~/bin/countseqs, make it executable (chmod +x ~/bin/countseqs) and use it in lots of situations, as a quick sanity check.

(oh, btw, this is not public domain and u can’t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).

Couldn’t help myself … everyone else is doing it 🙂

Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2

Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the official wwwblast documentation at NCBI. These instructions are for Ubuntu 8.04, but probably will work with many recent flavours of Debian.
Continue reading

Software review: producing two dimensional diagrams of membrane proteins

E. coli LamB, presented using TMRPres2D. Not that the cytoplasmic/extracellular labels are incorrect, and should say extracellular/periplasmic.

I recently needed to make a simple, two dimensional figure of a beta-barrel membrane protein. I went hunting for programs that might take a sequence and/or structure and produce a pretty looking diagram to save me constructing everything by hand. Here are two I found and tried.

Continue reading

ResolveRef updated : now with auto-suggest and source code

I updated ResolveRef last night and checked in the most current sourcecode to svn at Google Code.

New features include:

ResolveRef, now prettier, with comments box by disqus.

  • Suggest/autocomplete for journal title field, using the journal title lists provided by PubMed.
  • A “Verify” button. Allows a ResolveRef URL to be constructed with the web form and verified as working and valid without actually forwarding the user to the article.
  • Some bugfixes (handled the case where there is no DOI in the PubMed record, handled network timeouts to PubMed)
  • Refreshed visuals
  • Disqus comments box for feedback

In the interest of just getting something working quickly, I implemented the suggest feature in the laziest, possibly most RAM and CPU hungry way possible (the “JQuery Suggest” code queries the web app with substrings as you type each character. At the server side, the app uses a regex to scan a ~1.5 Mb list of journal titles held in RAM). I’ve already noticed a few “This request used a high amount of CPU” warnings in the logs, with the threat “High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled“. If my nasty hack starts heating up Google’s datacentre too much, I might have to disable the ‘suggest’ feature until I can implement it “properly”.

Continue reading

ResolveRef : looking at the logs

One of the nice features of Google App Engine is you can easily view logs for your application to quickly see requests generating errors. Browsing the logs of ResolveRef, I’ve been able to identify an few classes of query which for one reason or another, weren’t working.

Continue reading