TrinotateWeb in a Docker container

TrinotateWeb shows some reports from Trinotate. I know very little about it (please don’t ask me how to run Trinotate or interpret your results), but I wanted to serve up the reports. To make the services we provide to end-users a little more portable and reproducable, we tend to wrap them up as Docker containers. Even if we don’t actually ever move the images/containers between hosts, the Dockerfile acts as ‘runnable documentation’ on how a key part of the service is setup.

We do a similar thing for private instances of SequenceServer when researchers want a convenient interface to BLAST search some of their private (hopefully eventually open !) sequence databases.

The container here is self-contained with the data baked in. You may not want this, but an immutable container containing the analysis is what we wanted.

The code lives here: github.com/MonashBioinformaticsPlatform/bio-service-containers/

Requires:

  • TrinotateAnno.sqlite – the database generated via Trinotate
  • lighttpd.conf (provided) – preconfigured, don’t edit.
FROM debian:buster-slim

ENV TRINOTATE_HOME=/app/Trinotate
ENV TRINOTATE_VERSION=3.1.1

WORKDIR /app

RUN apt-get -y update && \
    apt-get install -y lighttpd libhtml-template-perl libdbd-sqlite3-perl && \
    rm -rf /var/lib/apt/lists/*

# Not required, using Debian packages instead
# RUN apt-get install -y cpanminus build-essential
# RUN cpanm -i DBI && \
#     cpanm -i HTML && \
#     cpanm -i HTML::Template && \
#     cpanm -i DBD::SQLite

ADD https://github.com/Trinotate/Trinotate/archive/Trinotate-v${TRINOTATE_VERSION}.tar.gz Trinotate-v${TRINOTATE_VERSION}.tar.gz
RUN tar xvzf Trinotate-v${TRINOTATE_VERSION}.tar.gz && \
    rm Trinotate-v${TRINOTATE_VERSION}.tar.gz && \
    mv Trinotate-Trinotate-v${TRINOTATE_VERSION} Trinotate

COPY TrinotateAnno.sqlite /data/TrinotateAnno.sqlite 
COPY lighttpd.conf /app/lighttpd.conf

RUN chown -R www-data:www-data /app

EXPOSE 80

ENTRYPOINT ["lighttpd", "-D", "-f", "/app/lighttpd.conf"]

“Production”

On port 4569.

docker run --name DatasetName_Trinotate --restart=always -it -d -p 4569:80 trinotate:dataset_name

Use Apache to forward (proxy) to the container for a nice URL (eg /apps/trinotate/DatasetName), behind .htaccess.

Option: external data and config in a host directory

With a few small edits to the Dockerfile (comment out the Trinotate download and sqlite db COPY), you can instead use an external copy of Trinotate and a database on the host.
You might want this for data that is going to be in flux for a while, before baking it permanently in a container (?).

docker run --name DatasetName_Trinotate --rm -it -d -p 4569:80 -v $(pwd):/app -v /home/perry/bin/Trinotate-Trinotate-v3.1.1/:/app/Trinotate -v $(pwd)/TrinotateAnno.sqlite:/data/TrinotateAnno.sqlite trinotate:dataset_name

Apache config

Use this to forward incoming requests to /apps/trinotate/DatasetName/ -> the port on the Docker container (4569), with a custom htaccess file for Basic Auth.

    # /apps/trinotate/DatasetName
    <Proxy "http://localhost:4569/*">
      Order deny,allow
      Allow from all
      Authtype Basic
      Authname "Restricted Content"
      AuthUserFile /etc/apache2/htaccess/DatasetName
      Require valid-user
    </Proxy>

    RewriteEngine on

    # For TrinotateWeb inside a Docker container - absolute URLs mean /css and /js links break
    # when proxied, unless we use this RewriteCond trick detecting referrer. 
    RewriteCond "%{HTTP_REFERER}" ".*bioinformatics.erc.monash.edu(?:.au)?/apps/trinotate/DatasetName/.*" [NV]
    RewriteRule ^/css/(.*)$ "http://localhost:4569/css/$1" [P]
    RewriteCond "%{HTTP_REFERER}" ".*bioinformatics.erc.monash.edu(?:.au)?/apps/trinotate/DatasetName/.*" [NV]
    RewriteRule ^/js/(.*)$ "http://localhost:4569/js/$1" [P]
    RewriteRule ^/apps/trinotate/DatasetName$ /apps/trinotate/DatasetName/ [R]
    RewriteRule ^/apps/trinotate/DatasetName/(.*)$ "http://localhost:4569/$1" [P]

TrinotateWeb makes requests to https://canvasxpress.org/ – as of 28-Jun-2018 the certificates for HTTPS are currently expired. The user should visit https://canvasxpress.org/ first and accept the insecure certificate so that icons in TrinotateWeb load correctly.

Running a local JABAWS server for Jalview on Ubuntu (11.04 Natty)

The excellent Jalview sequence alignment visualization and editing tool has the ability to send a set of sequences to a multiple sequence alignment web service (“JABAWS”) and receive the results in a new alignment window. This is really convenient when you are doing lots of sequence analysis, and Geoff Barton’s group at the University of Dundee provide a JABAWS server that Jalview will use by default.

But maybe the Dundee server is down. Or maybe you think your local machine will do things faster. Or maybe you work on über secret sequences in some Faraday cage bunker with no permanent network connection. In each of these cases, you may want to run your own local JABAWS server and use that instead. In this case, read on.

Continue reading

Stack Exchange sites for science

Recently I’ve noticed the emergence of several Stack Overflow-style sites for science-related questions and answers. For those unfamiliar with Stack Overflow – it’s a question and answer ‘forum’ for computer programmers that keeps the signal-to-noise ratio very high through a carefully refined reputation system. Late last year the creators of Stack Overflow launched a hosted service called Stack Exchange, which allows anyone to start their own “Stack Overflow” based around any topic.

2363694735_507a4eea3b_o

http://www.flickr.com/photos/alicebartlett/ / CC BY-NC 2.0

The service is was a little pricey ($129+/month), and I suspect this is one reason why a few open source clones inspired by Stack Overflow also exist. Since then, Stack Exchange sites (or clones) have proliferated – and those working as scientists (or those interested in science) haven’t been neglected. Here are my favorites:
  • MajorGroove.org pitches itself as a ‘forum for biologists’, which it is, however most of the content currently focuses on X-ray crystallography and associated techniques. It is currently in ‘bootstrap mode’, which means that reputation requirements are a little less strict until the userbase and site activity has grown to a critical size. Is there even a need for a Stack Exchange forum for biological crystallography ? Macromolecular crystallography already has a single, central, de facto standard forum – the CCP4BB mailing list. While it may be antiquated by Web2.0 standards, CCP4BB works well for a lot of people, and there is a huge amount of useful and important information buried in it’s archives. For many crystallographers, it seems CCP4BB would only be extracted from their “cold dead hands”. Despite this, I think the Stack Overflow format will be very beneficial for people new to the field.  As a side note – I discovered MajorGroove via Graeme Winters XIA2 blog right around the time when I was considering kickstarting a “Stack Overflow for crystallography”. At the moment it seems that a small userbase of crystallographers is already established on MajorGroove and there would be no purpose for another near identical forum. Even if questions about other techniques in the biosciences start to dilute out the structural biology, one click on the ‘crystallography‘ tag or the ‘ccp4‘ tag, and you can get straight to the good stuff. (In fact this feature was deemed useful enough by Google that they decided to bless the ‘android‘ tag on Stack Overflow as the official Android Q&A forum).
  • NMRWiki Q&A (http://qa.nmrwiki.org/) is a StackExchange-clone for magnetic resonances, mostly focused on NMR, but also open to EPR/ESR and MRI users. It’s not actually running on the StackExchange platform, but uses the open source OSQA / CNPROG clone, built on top of Django. As far as I know, there is no “CCP4BB for NMR”, which makes the NMRWiki Q&A site potentially even more valuable to structural biologists than it’s crystallography centric cousin, MajorGroove. Back when I was doing my PhD using protein NMR spectroscopy as my primary technique, there were very few good resources like this online – I do less NMR these days, but you can bet that I’ll be using the NMRWiki Q&A and it’s associated wiki to refresh my memory and catch up on need methodological developments in the future.
  • BioStar (http://biostar.stackexchange.com/), a StackExchange for bioinformatics, computational genomics and systems biology questions and answers. This one is busier and better established than the above mentioned forums, probably by virtue of the fact the bioinformaticians spend more time in front of the computer than your average molecular biologist or structural biologist.
  • And, for a bit of fun: Skeptic Exchange (http://exchange.bristolskeptics.co.uk/), which covers rational questions and answers to various topics including pseudoscience, faith healing, the supernatural and alternative medicine.

Want more ? There are a bunch of science related StackExchanges listed under “Science” here: http://meta.stackexchange.com/questions/4/list-of-stackexchange-sites .. and digging back through the FriendFeed archives I see Matt Todd initiated a concise listing (which if I’d seen, I probably never would have started this post).

And now, the latest* news Stack Exchange 2.0 will be ‘free‘. It looks like they are trying to structure the new Stack Exchange ecosystem a bit like the Usenet hierarchy (comp.*, rec.* etc), with a fairly involved discussion, proposal and acceptance process for new sites – it’s unclear yet whether this approach is going to work out better than just open sourcing the whole shebang, but time will tell. My guess is that BioStar, MajorGroove and probably even an incarnation of NMRWiki Q&A will eventually become part of this formalized ecosystem.

On one hand making StackExchange sites free to run is great – it lowers the barrier to entry to allow many more sites to emerge and operate. On the other hand, as we have seen with the acquisition of FriendFeed by Facebook, not having a clear revenue stream can ultimately leave communities  (such as The Life Scientists) without any certainty in a sites future, potentially impacting growth and participation. Personally I’m much more inclined to invest time in a site if it is something like Wikipedia, where I know my contributions are very likely to live on, in some form, for decades (centuries ?) to come. Ideally the archives of these new Stack Exchange sites could become useful online resources for decades to come – but with a single company at the helm and a “Web 2.0 business model”, continued operation for even a decade seems unlikely. The one saving grace: all content on the new Stack Exchange sites will be licensed under a Creative Commons license – so if Stack Exchange itself is acquired and shut down, we will always be able to preemptively leech the archives and provide them online elsewhere. Maybe it’s strange that I’m already thinking about archiving the new Stack Exchange upon it’s demise before it’s even begun, but I think it’s important to take the long term view with our data and recorded wisdom. Unlike when in 1994 when GeoCities (R.I.P) was started, teh Internets is no longer a fad – the hard disks connected to it are fast becoming the sum of all accessible human knowledge, so we’d better make sure we can retain the good bits for a little longer than 10 years.

* – as all too common these days .. I’m a little behind the curve on this one. I meant to finish this post a month ago, but with a busy time pre-holiday, then the actual holiday, a month has gone by.

A proposal for encouraging user contributed annotations to Uniprot

Today I attended a presentation by Maria J Martin about Uniprot and various other EBI database services. At the end of the talk, someone asked something to the effect of “How about simplifying user submission of annotations / corrections” – they wanted something in addition to the current ‘free text’ feedback and comments forms, and wanted a way to easily suggest annotations in a structured way. There was some suggestion of wiki’s etc, and how this had been tried to some extent, but they hadn’t got it right yet.

Here is my take on an approach to user submitted content to Uniprot. Essentially users should be able to add/change annotations piecewise, directly via the standard Uniprot web page for each protein record. These changes would ‘go live’ immediately, but since a large part of the value in Uniprot lies in its curation by expert annotators, the interface would also provide a very clear separation between user-submitted ‘uncurated’ annotations and the current expertly curated annotations.

I’ve made some mockups of how some parts of the UI may look in my little fantasy world:

Uniprot mockup 1, User/annotations and History Continue reading

Naming in molecular biology: get comfortable with meaninglessness !

I noticed an interesting post over on BoingBoing: “Comfort with meaninglessness the key to good programmers“. It outlines some research by Dehnadi and Bornat on attributes that can predict aptitude in computer programming. They conclude that a “deep comfort with meaninglessness” is an important predictor of programming aptitude.

I think comfort with meaninglessness is an important skill in studying biology (and probably other sciences too). Many times, during the description of a system, various acronyms are thrown about as labels for entities (or ‘actors’) in that system. An important skill of the scientist is being able to follow how all the actors in the system relate to each other, without necessarily knowing anything about the specific properties of those actors. There are lots of protein and gene names which often bear very little meaning relative to the biological entity that they label, and fixating on what ‘the name’ means simply distracts from the true nature of the entity.

Continue reading