<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
>

<channel>
	<title>Your bones got a little machine &#187; bioinformatics</title>
	<atom:link href="http://blog.pansapiens.com/category/bioinformatics/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.pansapiens.com</link>
	<description>Ideas are cheap, implementation is expensive; act accordingly.</description>
	<lastBuildDate>Mon, 17 May 2010 02:20:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
		<item>
		<title>Stack Exchange sites for science</title>
		<link>http://blog.pansapiens.com/2010/05/12/stackexchange-sites-for-science/</link>
		<comments>http://blog.pansapiens.com/2010/05/12/stackexchange-sites-for-science/#comments</comments>
		<pubDate>Wed, 12 May 2010 05:33:48 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[crystallography]]></category>
		<category><![CDATA[friendfeed]]></category>
		<category><![CDATA[nmr]]></category>
		<category><![CDATA[stack exchange]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=205</guid>
		<description><![CDATA[Recently I&#8217;ve noticed the emergence of several Stack Overflow-style sites for science-related questions and answers. For those unfamiliar with Stack Overflow &#8211; it&#8217;s a question and answer &#8216;forum&#8217; for computer programmers that keeps the signal-to-noise ratio very high through a carefully refined reputation system. Late last year the creators of Stack Overflow launched a hosted [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I&#8217;ve noticed the emergence of several Stack Overflow-style sites for science-related questions and answers. For those unfamiliar with Stack Overflow &#8211; it&#8217;s a question and answer &#8216;forum&#8217; for computer programmers that keeps the signal-to-noise ratio very high through a carefully refined reputation system. Late last year the creators of Stack Overflow launched a hosted service called Stack Exchange, which allows anyone to start their own &#8220;Stack Overflow&#8221; based around any topic.</p>
<div id="attachment_222" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.flickr.com/photos/alicebartlett/2363694735/"><img class="size-medium wp-image-222   " style="margin-top: 2px; margin-bottom: 2px;" src="http://blog.pansapiens.com/wp-content/uploads/2010/05/2363694735_507a4eea3b_o-300x237.jpg" alt="2363694735_507a4eea3b_o" width="300" height="237" /></a><p class="wp-caption-text"> http://www.flickr.com/photos/alicebartlett/ / CC BY-NC 2.0</p></div>
<div>The service is was a little pricey ($129+/month), and I suspect this is one reason why a few open source clones inspired by Stack Overflow also exist. Since then, Stack Exchange sites (or clones) have proliferated &#8211; and those working as scientists (or those interested in science) haven&#8217;t been neglected. Here are my favorites:</div>
<ul>
<li><a href="http://majorgroove.org/">MajorGroove.org</a> pitches itself as a &#8216;forum for biologists&#8217;, which it is, however most of the content currently focuses on X-ray crystallography and associated techniques. It is currently in &#8216;bootstrap mode&#8217;, which means that reputation requirements are a little less strict until the userbase and site activity has grown to a critical size. Is there even a need for a Stack Exchange forum for biological crystallography ? Macromolecular crystallography already has a single, central, <em>de facto </em>standard forum &#8211; the <a href="https://www.jiscmail.ac.uk/cgi-bin/webadmin?S1=CCP4BB">CCP4BB mailing list</a>. While it may be antiquated by Web2.0 standards, CCP4BB works well for a lot of people, and there is a huge amount of useful and important information buried in it&#8217;s archives. For many crystallographers, it seems CCP4BB would only be extracted from their &#8220;cold dead hands&#8221;. Despite this, I think the Stack Overflow format will be very beneficial for people new to the field.  <em>As a side note </em>- I discovered MajorGroove via <a href="http://xia2.blogspot.com/">Graeme Winters XIA2 blog</a> right around the time when I was considering kickstarting a &#8220;Stack Overflow for crystallography&#8221;. At the moment it seems that a small userbase of crystallographers is already established on MajorGroove and there would be no purpose for another near identical forum. Even if questions about other techniques in the biosciences start to dilute out the structural biology, one click on the &#8216;<a href="http://www.majorgroove.org/questions/tagged/crystallography">crystallography</a>&#8216; tag or the &#8216;<a href="http://www.majorgroove.org/questions/tagged/ccp4">ccp4</a>&#8216; tag, and you can get straight to the good stuff. (In fact this feature was deemed useful enough by Google that they decided to bless the &#8216;<a href="http://stackoverflow.com/questions/tagged/android">android</a>&#8216; tag on Stack Overflow as the official Android Q&amp;A forum).</li>
<li>NMRWiki Q&amp;A (<a href="http://qa.nmrwiki.org/">http://qa.nmrwiki.org/</a>) is a StackExchange-clone for magnetic resonances, mostly focused on NMR, but also open to EPR/ESR and MRI users. It&#8217;s not actually running on the StackExchange platform, but uses the open source <a href="http://github.com/cnprog/CNPROG/network">OSQA / CNPROG</a> clone, built on top of Django. As far as I know, there is no &#8220;CCP4BB for NMR&#8221;, which makes the NMRWiki Q&amp;A site potentially even more valuable to structural biologists than it&#8217;s crystallography centric cousin, MajorGroove. Back when I was doing my PhD using protein NMR spectroscopy as my primary technique, there were very few good resources like this online &#8211; I do less NMR these days, but you can bet that I&#8217;ll be using the NMRWiki Q&amp;A and it&#8217;s associated wiki to refresh my memory and catch up on need methodological developments in the future.</li>
<li>BioStar (<a href="http://biostar.stackexchange.com/">http://biostar.stackexchange.com/</a>), a StackExchange for bioinformatics, computational genomics and systems biology questions and answers. This one is busier and better established than the above mentioned forums, probably by virtue of the fact the bioinformaticians spend more time in front of the computer than your average molecular biologist or structural biologist.</li>
<li>And, for a bit of fun: Skeptic Exchange (<a href="http://exchange.bristolskeptics.co.uk/">http://exchange.bristolskeptics.co.uk/</a>), which covers rational questions and answers to various topics including pseudoscience, faith healing, the supernatural and alternative medicine.</li>
</ul>
<p>Want more ? There are a bunch of science related StackExchanges listed under &#8220;Science&#8221; here: <a href="http://meta.stackexchange.com/questions/4/list-of-stackexchange-sites">http://meta.stackexchange.com/questions/4/list-of-stackexchange-sites</a> .. and digging back through the <a href="http://friendfeed.com/todd-lab/dd6ae79e/some-stack-exchange-based-science-discussion#">FriendFeed archives I see Matt Todd initiated a concise listing</a> (which if I&#8217;d seen, I probably never would have started this post).</p>
<p>And now, the latest<strong>*</strong> news <a href="http://blog.stackexchange.com/post/518474918/stack-exchange-2-0">Stack Exchange 2.0 will be &#8216;free</a>&#8216;. It looks like they are trying to structure the new Stack Exchange ecosystem a bit like the Usenet hierarchy (comp.*, rec.* etc), with a fairly involved discussion, proposal and acceptance process for new sites &#8211; it&#8217;s unclear yet whether this approach is going to work out better than just open sourcing the whole shebang, but time will tell. My guess is that BioStar, MajorGroove and probably even an incarnation of NMRWiki Q&amp;A will eventually become part of this formalized ecosystem.</p>
<p>On one hand making StackExchange sites free to run is great &#8211; it lowers the barrier to entry to allow many more sites to emerge and operate. On the other hand, as we have seen with the acquisition of FriendFeed by Facebook, not having a clear revenue stream can ultimately leave communities  (such as <a href="http://friendfeed.com/the-life-scientists">The Life Scientists</a>) without any certainty in a sites future, potentially impacting growth and participation. Personally I&#8217;m much more inclined to invest time in a site if it is something like Wikipedia, where I know my contributions are very likely to live on, in some form, for decades (centuries ?) to come. Ideally the archives of these new Stack Exchange sites could become useful online resources for decades to come &#8211; but with a single company at the helm and a &#8220;Web 2.0 business model&#8221;, continued operation for even a decade seems unlikely. The one saving grace: all content on the new Stack Exchange sites will be licensed under a Creative Commons license &#8211; so if Stack Exchange itself is acquired and shut down, we will always be able to preemptively leech the archives and provide them online elsewhere. Maybe it&#8217;s strange that I&#8217;m already thinking about archiving the new Stack Exchange upon it&#8217;s demise before it&#8217;s even begun, but I think it&#8217;s important to take the long term view with our data and recorded wisdom. Unlike when in 1994 when GeoCities (<a href="http://www.oocities.com/">R.I.P</a>) was started, teh Internets is no longer a fad &#8211; the hard disks connected to it are fast becoming the sum of all accessible human knowledge, so we&#8217;d better make sure we can retain the good bits for a little longer than 10 years.</p>
<p><em>* &#8211; as all too common these days .. I&#8217;m a little behind the curve on this one. I meant to finish this post a month ago, but with a busy time pre-holiday, then the actual holiday, a month has gone by.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2010/05/12/stackexchange-sites-for-science/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>A proposal for encouraging user contributed annotations to Uniprot</title>
		<link>http://blog.pansapiens.com/2009/08/03/a-proposal-for-encouraging-user-contributed-annotations-to-uniprot/</link>
		<comments>http://blog.pansapiens.com/2009/08/03/a-proposal-for-encouraging-user-contributed-annotations-to-uniprot/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 09:21:56 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[uniprot]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=143</guid>
		<description><![CDATA[Today I attended a presentation by Maria J Martin about Uniprot and various other EBI database services. At the end of the talk, someone asked something to the effect of &#8220;How about simplifying user submission of annotations / corrections&#8221; &#8211; they wanted something in addition to the current &#8216;free text&#8217; feedback and comments forms, and [...]]]></description>
			<content:encoded><![CDATA[<p>Today I attended a presentation by Maria J Martin about Uniprot and various other EBI database services. At the end of the talk, someone asked something to the effect of &#8220;How about simplifying user submission of annotations / corrections&#8221; &#8211; they wanted something in addition to the current &#8216;free text&#8217; feedback and comments forms, and wanted a way to easily suggest annotations in a structured way. There was some suggestion of wiki&#8217;s etc, and how this had been tried to some extent, but they hadn&#8217;t got it right yet.</p>
<p>Here is my take on an approach to user submitted content to Uniprot. Essentially users should be able to add/change annotations piecewise, directly via the standard Uniprot web page for each protein record. These changes would &#8216;go live&#8217; immediately, but since a large part of the value in Uniprot lies in its curation by expert annotators, the interface would also provide a very clear separation between user-submitted &#8216;uncurated&#8217; annotations and the current expertly curated annotations.</p>
<p>I&#8217;ve made some mockups of how some parts of the UI may look in my little fantasy world:</p>
<p style="text-align: left;"><a href="http://blog.pansapiens.com/wp-content/uploads/2009/08/mockup1_history_crop.png" rel="lightbox[143]"><img class="aligncenter size-medium wp-image-144" title="Uniprot mockup 1, User/annotations and History" src="http://blog.pansapiens.com/wp-content/uploads/2009/08/mockup1_history_crop-300x97.png" alt="Uniprot mockup 1, User/annotations and History" width="300" height="97" /></a><span id="more-143"></span><br />
• User login box at top (eg, OpenID)<br />
• A History tab at the top.<br />
• User submitted changes tab.<br />
• Maybe a &#8220;Discussion&#8221; tab, ala Wikipedia (not pictured).<br />
• Each field, or block of related fields, would have an Add/edit button at the top right of the block. (I&#8217;ve chosen the <a href="http://universaleditbutton.org/Universal_Edit_Button">Universal Edit Button</a> as an example)</p>
<p style="text-align: left;"><em>Aftertought: Maybe putting these features under tabs isn&#8217;t quite the best place, since the existing tabs are &#8216;actions&#8217; that can be taken rather than &#8216;extra info&#8217; to be viewed. This UI detail could certainly be refined.</em></p>
<p style="text-align: left;">
<a href="http://blog.pansapiens.com/wp-content/uploads/2009/08/mockup2_edit_button_crop.png" rel="lightbox[143]"><img class="aligncenter size-medium wp-image-145" title="Uniprot mockup 2, an edit button" src="http://blog.pansapiens.com/wp-content/uploads/2009/08/mockup2_edit_button_crop-300x88.png" alt="Uniprot mockup 2, an edit button" width="300" height="88" /></a><br />
This proposal has many wiki-like features (history, attribution, open editing, curation by trusted users and potentially page/section locking) but doesn&#8217;t really fit my definition of a wiki since the input format is not free-form wiki-text, but is instead constrained by the interface to enforce the submission of (mostly) structured data (eg, a traditional data entry into an HTML form, or in-line editing of fields).</p>
<p>Any authenticated user would be able to add or edit fields by clicking on the &#8220;Add/edit annotations&#8221; button associated with that block (see mockup above). They would then be sent to a page where they can click to edit a particular field (in this case a point mutation and associated change in function), or click &#8220;Add new&#8221; to add a new mutation field and fill out the details (I didn&#8217;t make a mockup picture for this .. use your imagination). They also must specify one of the standard &#8220;evidence codes&#8221; from a dropdown box for each change/addition, including the PMID of a publication if relevant. User submissions are automatically flagged with some type of &#8216;user submitted&#8217; flag too, and a username. Homologs (from UniRef clusters) could also be listed here to remind the user that certain annotations might need to be propogated to other members of the same family, if required (otherwise the curators would do this part, when applicable, for the next Uniprot release). For all I know, Uniprot may already have an interface similar to this, already in use by their professional curators. In effect, I&#8217;d like to see the 37signals &#8220;<a href="http://gettingreal.37signals.com/ch09_One_Interface.php">One interface</a>&#8221; dictum applied.</p>
<p>User submitted changes would not automatically go live on the main Uniprot record page, but can be seen by clicking the &#8220;User submitted&#8221; tab at the top. Alternatively, the user submitted annotations could be put at the bottom of the page, like most blog comments, but clearly differentiated from the curated data by colour and other visual queues. The REST API could be told to include/exclude uncurated user annotations in responses by an extra query flag in the request (eg &amp;userannotations=true). Uniprot curators can periodically review user submitted annotations and integrate them into the official Uniprot release as they see fit.</p>
<p>Under the History tab, the history of changes to that Uniprot record, both by user submitted changes and by Uniprot release would be available. This functionality is already mostly available under &#8220;Entry history-&gt;Complete history&#8221; at the bottom of the page, but user submitted annotations would also be included here with appropriate diff colouring (eg, coloured differently to curated changes, until they are officially accepted).</p>
<p>Providing user pages at a URL: <em>http://www.uniprot.org/user/some_sensible_username</em> with an associated RSS/ATOM feed would encourage participation by highlighting individual user contributions, and potentially allow a Wikipedia-like community of expert/fanatical annotators to emerge.</p>
<p>The Discussion tab would be used in much the same way Wikipedia Talk pages are &#8211; passive users, contributors and curators would be able to discuss the finer details of any submitted annotations. I&#8217;m of two minds about this one, since anyone who has read Wikipedia Talk pages knows things can get quite ugly there sometimes. On the other hand, the communication it allows would be important for building a community of annotators and helping clarify contributions.</p>
<p>PS: I&#8217;m a Uniprot fanboy. Can you tell ? <img src='http://blog.pansapiens.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2009/08/03/a-proposal-for-encouraging-user-contributed-annotations-to-uniprot/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>Naming in molecular biology: get comfortable with meaninglessness !</title>
		<link>http://blog.pansapiens.com/2008/12/14/naming-in-molecular-biology-get-comfortable-with-meaninglessness/</link>
		<comments>http://blog.pansapiens.com/2008/12/14/naming-in-molecular-biology-get-comfortable-with-meaninglessness/#comments</comments>
		<pubDate>Sun, 14 Dec 2008 01:13:45 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[molecular biology]]></category>
		<category><![CDATA[semantic]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=92</guid>
		<description><![CDATA[I noticed an interesting post over on BoingBoing: &#8220;Comfort with meaninglessness the key to good programmers&#8220;. It outlines some research by Dehnadi and Bornat on attributes that can predict aptitude in computer programming. They conclude that a &#8220;deep comfort with meaninglessness&#8221; is an important predictor of programming aptitude. I think comfort with meaninglessness is an [...]]]></description>
			<content:encoded><![CDATA[<p>I noticed an interesting post over on BoingBoing: &#8220;<a href="http://www.boingboing.net/2008/12/12/comfort-with-meaning.html">Comfort with meaninglessness the key to good programmers</a>&#8220;. It outlines some <a href="http://www.cs.mdx.ac.uk/research/PhDArea/saeed/">research by Dehnadi and Bornat</a> on attributes that can predict aptitude in computer programming. They conclude that a &#8220;deep comfort with meaninglessness&#8221; is an important predictor of programming aptitude.</p>
<p>I think comfort with meaninglessness is an important skill in studying biology (and probably other sciences too). Many times, during the description of a system, various acronyms are thrown about as labels for entities (or &#8216;actors&#8217;) in that system. An important skill of the scientist is being able to follow how all the actors in the system relate to each other, without necessarily knowing anything about the specific properties of those actors. There are lots of protein and gene names which often bear very little meaning relative to the biological entity that they label, and fixating on what &#8216;the name&#8217; means simply distracts from the true nature of the entity.</p>
<p><span id="more-92"></span></p>
<p><strong>Example:</strong> TPR proteins are a superfamily of protein fold, often involved in protein-protein interactions. I have sometimes been asked at poster presentations, or the occasional talk: &#8220;What does TPR stand for ?&#8221;. &#8220;TPR&#8221; is an acronym for &#8220;<em>t</em>etratrico<em>p</em>eptide <em>r</em>epeat&#8221; &#8230; you may be able to glean from that expansion that the protein fold is composed of repeat sequences 34 amino acids long &#8211; but that is only one small aspect of the family, and isn&#8217;t the important point. Yet many molecular biologists appear uncomfortable with an &#8220;undefined&#8221; acronym, and insist on having it expanded to reveal the full name. TPR is just a convenient label for the superfamily &#8230; it could equally have been called <em>GrratBlat</em> or <em>5450520A, </em>it would still be the same thing. The point is, you <em>shouldn&#8217;t have to ask</em> what TPR stands for. Sure, it&#8217;s a curiosity, and some protein names can be amusing (Sonic Hedgehog, or &#8220;Just Another Kinase&#8221; come to mind), it may also contain some meaning, but first and foremost it&#8217;s a label &#8211; something to link the entity to all the other descriptive information about it&#8217;s structure, function, localisation and regulation. Like many classes of protein, the original name was given at a time when little was actually known about the thing, and typically the meaning embedded in the name should be ignored lest it bias our interpretation about what that protein really does.</p>
<p><strong>Summary of opinion:</strong> Molecular biologists should become comfortable with the notion that a name is just a label &#8211; meaningless without the associated metadata.</p>
<p>All of this is probably second nature those who studied philosophy (or computer science, or linguistics) &#8230; I&#8217;m guessing it is an issue of <a href="http://en.wikipedia.org/wiki/Semantic">semantics</a>. I really should have taken some of those subjects back in my undergrad days &#8230; <img src='http://blog.pansapiens.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/12/14/naming-in-molecular-biology-get-comfortable-with-meaninglessness/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>Searching bioinformatic databases with YubNub</title>
		<link>http://blog.pansapiens.com/2008/11/12/searching-bioinformatic-databases-with-yubnub/</link>
		<comments>http://blog.pansapiens.com/2008/11/12/searching-bioinformatic-databases-with-yubnub/#comments</comments>
		<pubDate>Wed, 12 Nov 2008 11:29:16 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[two-point-oh]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[yubnub]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=87</guid>
		<description><![CDATA[You may already be familiar with YubNub; it describes itself as &#8220;the social command line for the web&#8221;. Most commands consist of two (or more) words &#8230; one for the search engine, the other for the query. For example, typing: gg open science on friendfeed into the YubNub search box searches Google for &#8220;open science [...]]]></description>
			<content:encoded><![CDATA[<p>You may already be familiar with <a href="http://yubnub.org/">YubNub</a>; it describes itself as &#8220;the social command line for the web&#8221;. Most commands consist of two (or more) words &#8230; one for the search engine, the other for the query.</p>
<p>For example, typing:</p>
<blockquote><p><em><strong>gg open science on friendfeed</strong></em></p></blockquote>
<p>into the YubNub search box searches Google for &#8220;<em>open science on friendfeed</em>&#8220;, via YubNub.</p>
<p>I thought I&#8217;d highlight a few life science- and bioinformatics-related YubNub commands I find myself using quite often in my day-to-day work. Some are commands I created, others someone else created. This is the beauty of YubNub &#8230; often someone has already made the &#8216;obvious&#8217; command &#8230; it&#8217;s worth just trying to search with a command you expect to exist, since it often does.</p>
<p>Onward, with the list:</p>
<p><span id="more-87"></span></p>
<ul>
<li><a href="http://yubnub.org/kernel/man?args=pdb"><strong>pubmed</strong></a> &#8212; Searches PubMed</li>
<li><a href="http://yubnub.org/kernel/man?args=hubmed"><strong>hubmed</strong></a> &#8212; Searches <a href="http://www.hubmed.org/">HubMed</a> (Alf Eatons featureful alternative interface to PubMed)</li>
<li><a href="http://yubnub.org/kernel/man?args=gopubmed"><strong>gopubmed</strong></a> &#8212; Searches <a href="http://www.gopubmed.org/">GoPubMed</a> (an ontology enhanced PubMed search)</li>
<li><a href="http://yubnub.org/kernel/man?args=doi"><strong>doi</strong></a> &#8212; Redirects you based on a Digital Object Identifier (DOI), via <span class="muted">http://dx.doi.org/</span></li>
<li><a href="http://yubnub.org/kernel/man?args=pdb"><strong>pdb</strong></a> &#8212; Searches the Protein DataBank for 3D structures. Usually the search term should be a 4 letter pdb code.</li>
<li><a href="http://yubnub.org/kernel/man?args=uniprot"><strong>uniprot</strong></a> &#8212; Searches the Uniprot database (use an accession, id or keyword as the query).</li>
<li><a href="http://yubnub.org/kernel/man?args=ihop"><strong>ihop</strong></a> &#8212; Searches <a href="http://www.ihop-net.org">iHOP</a>, information Hyperlinked over Proteins, for views of the biomedical literature guided by gene networks. Nothing to do with <a href="http://www.google.com/search?q=ihop">pancakes (or prayer)</a>.</li>
</ul>
<p>There is also a class of more general, non-biomedical commands which I often use:</p>
<ul>
<li><a href="http://yubnub.org/kernel/man?args=gg"><strong>gg</strong></a> &#8212; The Google.</li>
<li><strong><a href="http://yubnub.org/kernel/man?args=gim">gim</a> &#8212; </strong>The Google Image Search.</li>
<li><a href="http://yubnub.org/kernel/man?args=wp"><strong>wp</strong></a> &#8212; Good ol&#8217; Wikipedia.</li>
<li><strong><a href="http://yubnub.org/kernel/man?args=ucc">ucc</a> </strong>&#8211; The universal currency converter at XE.com. Use it like <strong><em>ucc 399 aud usd</em></strong>, to convert $399 Australian dollars to US dollars. Then, if you have your cash in Australian dollars, weep about the recent drop in the exchange rate <img src='http://blog.pansapiens.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </li>
<li><strong><a href="http://yubnub.org/kernel/man?args=man">man</a></strong> &#8212; Like *nix man &#8216;manual pages&#8217;, but for YubNub commands. Eg, <strong><em>man ucc</em></strong> will give the manual page describing how to used the <em>ucc</em> command.</li>
<li><strong><a href="http://yubnub.org/kernel/man?args=ls">ls</a></strong> &#8212; A bit like the *nix shell ls, this command lists existing YubNub commands that contain your query in their name, description or url. eg. searching <strong><em><a href="http://yubnub.org/kernel/ls?args=protein">ls protein</a></em></strong> gives you a short list of all the commands related to proteins.</li>
</ul>
<p>I&#8217;ve installed the <a href="http://mycroft.mozdev.org/search-engines.html?name=yubnub">YubNub opensearch plugin</a> so I can search directly from the search box (or location bar) in Firefox. Maybe one day <a href="https://wiki.mozilla.org/Labs/Ubiquity">Ubiquity</a> will fulfill this purpose, since in many way it is the natural progression of the YubNub idea. But for the moment YubNub is the fastest, most streamlined way I&#8217;ve found to quickly fire off a search when I need to hunt down a reference, protein sequence or 3D structure. Nothing like instant gratification <img src='http://blog.pansapiens.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/11/12/searching-bioinformatic-databases-with-yubnub/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>Count the number of sequences in a FASTA format file: a Unix shell snippet</title>
		<link>http://blog.pansapiens.com/2008/09/01/count-the-number-of-sequences-in-a-fasta-format-file-a-unix-shell-snippet/</link>
		<comments>http://blog.pansapiens.com/2008/09/01/count-the-number-of-sequences-in-a-fasta-format-file-a-unix-shell-snippet/#comments</comments>
		<pubDate>Mon, 01 Sep 2008 05:49:38 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=82</guid>
		<description><![CDATA[Sometimes it&#8217;s nice to quickly check how many sequences are in a FASTA format sequence file. It barely warrants it&#8217;s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA &#8220;flat-file database&#8221;, based on the presence of the &#8220;&#62;&#8221; header symbol. #!/bin/sh # ~/bin/countseqs [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes it&#8217;s nice to quickly check how many sequences are in a <a href="http://en.wikipedia.org/wiki/FASTA_format">FASTA format</a> sequence file.</p>
<p>It barely warrants it&#8217;s own blog post, but here we go anyhow: my one-liner shell script for counting the number of sequences in a FASTA &#8220;flat-file database&#8221;, based on the presence of the &#8220;&gt;&#8221; header symbol.</p>
<div class="dean_ch" style="white-space: wrap;"><span class="re3">#!/bin/sh</span><br />
<span class="re3"># ~/bin/countseqs</span><br />
<span class="re3"># Counts the number of sequences <span class="kw1">in</span> a FASTA format file</span><br />
<span class="kw2">grep</span> <span class="st0">&quot;&gt;&quot;</span> $<span class="nu0">1</span> | <span class="kw2">wc</span> -l</div>
<p>Dead easy huh ? I put this in <em>~/bin/countseqs,</em> make it executable (<em>chmod +x ~/bin/countseqs</em>) and use it in lots of situations, as a quick sanity check.</p>
<p><em>(oh, btw, this is not public domain and u can&#8217;t use it for commercial gain without paying me a license fee. academic users can fax me something for a free license. k thx bye).</em></p>
<p><span style="color: #000000;">Couldn&#8217;t help myself &#8230; everyone else is doing it <img src='http://blog.pansapiens.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/09/01/count-the-number-of-sequences-in-a-fasta-format-file-a-unix-shell-snippet/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>Setting up NCBI wwwblast on Ubuntu 8.04 (Hardy), Apache 2</title>
		<link>http://blog.pansapiens.com/2008/08/25/setting-up-wwwblast-on-ubuntu-apache/</link>
		<comments>http://blog.pansapiens.com/2008/08/25/setting-up-wwwblast-on-ubuntu-apache/#comments</comments>
		<pubDate>Mon, 25 Aug 2008 07:28:35 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[blast]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=80</guid>
		<description><![CDATA[Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the official wwwblast documentation at NCBI. These instructions are for [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.pansapiens.com/wp-content/uploads/2008/08/wwwblast.png" rel="lightbox[80]"><img class="alignright size-medium wp-image-81" title="wwwblast" src="http://blog.pansapiens.com/wp-content/uploads/2008/08/wwwblast-300x262.png" alt="" width="300" height="262" /></a></p>
<p>Recently I needed to install NCBI wwwblast on my local workstation to enable some software that needed to interface with BLAST via the web service. It was straightforward to install, but I took some notes, because there were a few changes required with respect to the <a href="http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/wwwblast/wwwblast.html">official wwwblast documentation at NCBI</a>. These instructions are for Ubuntu 8.04, but probably will work with many recent flavours of Debian.<br />
<span id="more-80"></span></p>
<h3>Download and install</h3>
<p>Download NCBI wwwblast from <a href="ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/">ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/</a> (The version <span class="file">I used was wwwblast-2.2.18-ia32-linux.tar.gz).</span></p>
<p>Untar it into /var/www/ , preserving permissions.</p>
<pre><strong>$</strong> cd /var/www/</pre>
<pre><strong>$</strong> sudo tar zxvpf <span class="file">wwwblast-2.2.18-ia32-linux.tar.gz
</span></pre>
<p><span class="file">You will also need to make sure <em>csh</em> (the &#8220;C-shell&#8221;) is installed, since the <em>blast.cgi</em> script needs this to run:</span></p>
<pre><span class="file"><strong>$</strong> sudo apt-get install csh
</span></pre>
<p><em><span class="file">(Thanks to </span><span class="commentauthor">jpopesku for spotting this missing package [comments below])</span></em></p>
<h3>Set up Apache2</h3>
<p><a href="http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/wwwblast/node10.html">The instructions in the official wwwblast manual</a> didn&#8217;t seem to work for Apache2 .. (could be a typo: <em>Follow SymLinks </em>should be <em>FollowSymLinks</em> &#8230; one camel-case word without the space). I put this in into the VirtualHost definition in the standard <em>/etc/apache2/sites-available/default</em> file used by Apache2 in Ubuntu 8.04.</p>
<div class="dean_ch" style="white-space: wrap;"> &nbsp; &nbsp;# modified slightly from:<br />
&nbsp; &nbsp; #http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/wwwblast/node10.html<br />
&nbsp; &nbsp; &lt;Directory &quot;/var/www&quot;&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; AddHandler cgi-script cgi REAL<br />
&nbsp; &nbsp; &nbsp; &nbsp; Options Indexes FollowSymLinks MultiViews +ExecCGI<br />
&nbsp; &nbsp; &nbsp; &nbsp; Order allow,deny<br />
&nbsp; &nbsp; &nbsp; &nbsp; Allow from all<br />
&nbsp; &nbsp; &lt;/Directory&gt;</div>
<p>The code above was inserted just before the &lt;/VirtualHost&gt; closing tag.</p>
<p>Restart Apache for good measure.</p>
<pre><strong>$</strong> sudo /etc/init.d/apache2 restart</pre>
<h3>Configure your wwwblast to see your databases</h3>
<p>Assuming you already have some existing <a href="http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastdb.html">BLAST formatted databases made with <em>formatdb</em></a>, you will then need to configure wwwblast to find them, <a href=" http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/wwwblast/node12.html">as per the manual</a>. In short, this involves putting the BLAST database files into <em>/var/www/blast/db/ </em>(or symlinking to the files), modifying <em>/var/www/blast/blast.rc </em>(or one of the other *.rc files, for other BLAST programs, eg psiblast) to associate the database name with a BLAST program, and modifying the dropdown database list in <em>/var/www/blast/blast.html </em>(or one of the other *.html files for other BLAST programs) to add the name of the database(s).</p>
<p>My BLAST databases live in <em>/data/databases/blast/</em>, so I simply moved the example database directory <em>/var/www/blast/db </em>to <em>/var/www/blast/db.orig<strong>:</strong></em></p>
<pre><strong>$</strong> sudo mv  /var/www/blast/db /var/www/blast/db.orig</pre>
<p>then symlinked <em>/var/www/blast/db</em> to <em>/data/databases/blast</em>:</p>
<pre><strong>$</strong> sudo ln -s /data/databases/blast /var/www/blast/db</pre>
<p>In <em>/var/www/blast/blast.rc</em> I changed the line:</p>
<pre>blastp test_aa_db</pre>
<p>to:</p>
<pre>blastp nr swissprot pdbaa</pre>
<p><em>(you may also want to configure blastn etc in the same way)</em></p>
<p>In <em>/var/www/blast/blast.html</em>, I changed the test database code:</p>
<div class="dean_ch" style="white-space: wrap;">&lt;select name = &quot;DATALIB&quot;&gt;<br />
&nbsp; &nbsp; &lt;option VALUE = &quot;test_aa_db&quot;&gt; test_aa_db<br />
&nbsp; &nbsp; &lt;option VALUE = &quot;test_na_db&quot;&gt; test_na_db<br />
&lt;/select&gt;</div>
<p>to:</p>
<div class="dean_ch" style="white-space: wrap;">&lt;select name = &quot;DATALIB&quot;&gt;<br />
&nbsp; &nbsp; &lt;option VALUE = &quot;nr&quot;&gt; nr<br />
&nbsp; &nbsp; &lt;option VALUE = &quot;swissprot&quot;&gt; swissprot<br />
&nbsp; &nbsp; &lt;option VALUE = &quot;pdbaa&quot;&gt; pdbaa<br />
&lt;/select&gt;</div>
<p><em>(you may want to do this for other BLAST programs too, eg also edit psiblast.html)</em></p>
<p>This will allow be to search the <em>nr</em>, <em>swissprot</em> and <em>pdbaa</em> databases I have installed using plain-vanilla BLAST.</p>
<h3>Test it out</h3>
<p>Go to <a href="http://localhost/blast/blast.html" target="_blank">http://localhost/blast/blast.html</a>.</p>
<p>You should find the wwwblast interface, with a dropdown box featuring the database(s) you added. Run a test search with your favorite sequences (ensure you also select the correct blast program, eg <em>blastp</em> !). Enjoy using your own CPU time !</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/08/25/setting-up-wwwblast-on-ubuntu-apache/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>Software review: producing two dimensional diagrams of membrane proteins</title>
		<link>http://blog.pansapiens.com/2008/06/26/software-review-producing-two-dimensional-diagrams-of-membrane-proteins/</link>
		<comments>http://blog.pansapiens.com/2008/06/26/software-review-producing-two-dimensional-diagrams-of-membrane-proteins/#comments</comments>
		<pubDate>Wed, 25 Jun 2008 20:30:22 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[publication]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[structural biology]]></category>
		<category><![CDATA[two-point-oh]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[beta-barrels]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[structure]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=60</guid>
		<description><![CDATA[I recently needed to make a simple, two dimensional figure of a beta-barrel membrane protein. I went hunting for programs that might take a sequence and/or structure and produce a pretty looking diagram to save me constructing everything by hand. Here are two I found and tried. TMRPres2D Ioannis C. Spyropoulos, Theodore D. Liakopoulos, Pantelis [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/tmrpres2d_lamb_ecoli.png" rel="lightbox[60]"><img class="alignright size-medium wp-image-61" title="TMRPres2D LAMB_ECOLI" src="http://blog.pansapiens.com/wp-content/uploads/2008/06/tmrpres2d_lamb_ecoli-300x250.png" alt="E. coli LamB, presented using TMRPres2D. Not that the cytoplasmic/extracellular labels are incorrect, and should say extracellular/periplasmic." width="300" height="250" /></a><strong><a href="http://bioinformatics.biol.uoa.gr/TMRPres2D/"></a></strong></p>
<p>I recently needed to make a simple, two dimensional figure of a beta-barrel membrane protein. I went hunting for programs that might take a sequence and/or structure and produce a pretty looking diagram to save me constructing everything by hand. Here are two I found and tried.</p>
<p><span id="more-60"></span></p>
<p><strong><a href="http://bioinformatics.biol.uoa.gr/TMRPres2D/">TMRPres2D</a></strong></p>
<p><span>Ioannis C. Spyropoulos, Theodore D. Liakopoulos, Pantelis G. Bagos and Stavros J. Hamodrakas</span><span><strong> TMRPres2D: high quality visual representation of transmembrane protein models</strong><span style="text-decoration: underline;"> Bioinformatics</span>. 2004;  20: 3258-3260. (<a href="http://resolveref.appspot.com/ref/Bioinformatics/2004/20/3258">link</a>)<br />
</span><br />
<strong>Pros:</strong></p>
<ul>
<li> Cross-platform (Java)</li>
<li> Simple interface, GUI (zero learning curve)</li>
<li> Lots of input options (defines transmembrane regions directly from SwissP<a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/tmrpres2d_secy_bucai.png" rel="lightbox[60]"><img class="alignright size-medium wp-image-62" title="TMRPres2D SECY_BUCAI" src="http://blog.pansapiens.com/wp-content/uploads/2008/06/tmrpres2d_secy_bucai-300x197.png" alt="TMRPres2D diagram of SECY_BUCAI. Labels \" width="300" height="197" /></a>rot or PIR annotations online, takes input from several transmembrane region predictors)</li>
<li> Lots of output formats and options (Postscript, gif, jpg, png, svg, bmp)</li>
<li> Various colouring options (hydrophobicity, charge, &#8220;printer friendly&#8221;)</li>
<li> Makes reasonable looking diagrams of helical transmembrane proteins</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li> Doesn&#8217;t handle beta-barrel membrane proteins gracefully (strand drawing is overlapped, messy).</li>
<li>The membrane is assumed to be a eukaryotic plasma membrane, with labels &#8220;cytoplasmic/extracellular&#8221; (which should be, for instance, &#8220;extracellular/periplasm&#8221; for a bacterial outer membrane protein). This is easily changed on the diagram with external editing.</li>
</ul>
<p><strong><a href="http://www.pharmazie.uni-kiel.de/chem/Prof_Beitz/textopo.htm">TeXtopo</a></strong></p>
<p>Beitz, E. (2000), <strong>TeXtopo: shaded membrane protein topology  	plots in LaTeX2e</strong>. <em>Bioinformatics</em> <strong>16</strong>: 1050-1051. (<a href="http://resolveref.appspot.com/ref/Bioinformatics/2000/16/1050">link</a>). See the <a href="http://resolveref.appspot.com/ref/Bioinformatics/2000/16/1050">original publication</a> or <a href="http://www.uni-kiel.de/Pharmazie/chem/Prof_Beitz/biotex.html">Professor Eric Beitz&#8217;s site</a> for a better example than my image.</p>
<p><a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/secy_textopo.png" rel="lightbox[60]"><img class="alignright size-medium wp-image-64" title="SecY textopo diagram" src="http://blog.pansapiens.com/wp-content/uploads/2008/06/secy_textopo-300x214.png" alt="" width="300" height="214" /></a></p>
<p><strong>Pros:</strong></p>
<ul>
<li>Beautiful, clean, publication quality diagrams, courtesy of LaTeX</li>
<li>Multiple input options (Swissprot format, PHD, HMMTOP, user defined)</li>
<li>Multiple sequence annotation options including colouring by various physiochemical properties (hydrophobicity, charge), sequence conservation or user defined schemes.</li>
<li>Will depict membrane embedded half-loops and lipid anchors.</li>
<li>Versatile output (Postscript, pdf, dvi, basically anything that LaTex can be rendered as)</li>
<li>Also can generate attractive looking helical wheel plots</li>
<li>Did I mention the output is clean and looks great &#8230; ?</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li>Steep learning curve for the uninitiated, despite extensive documentation (ie LaTeX code, no GUI)</li>
<li>No support for beta-barrel membrane proteins</li>
</ul>
<p>If I ever need to make a 2D diagram of a helical membrane protein for a publication, TeXtopo would be my first choice. For quickly getting an overview of some transmembrane prediction results or a protein with defined tranmembrane regions in Uniprot, TMRPres2D is the quickest and easiest method.</p>
<p>In the end, since neither program would do a decent job at cleanly depicting the strands of a beta-barrel in a simple 2D plot, I ended up coding my own hackish solution (<a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/svg_barrel.tar.gz">svg_barrel.tar.gz</a> or <a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/svg_barrel_gui_win32.zip">svg_barrel_gui_win32.zip</a>) using Python and a tweaked version of <em>SVGdraw.py</em>. This allowed me to generate some SVG graphics to use as a starting point, and then hand edit the result in Inkscape to align strands to loosely match the real hydrogen bonding patterns. I also added some simple beizer curves for the loops, since neat placement of loop residues was the tricky part that I decided I didn&#8217;t have time to tackle.</p>
<p>Here&#8217;s the end result, after hand editing:<br />
    <object type="image/svg+xml" width="400" height="400" data="http://blog.pansapiens.com/wp-content/uploads/2008/06/lamb_2d_barrel.svg"><br />
      <img src="http://blog.pansapiens.com/wp-content/uploads/2008/06/lamb_2d_barrel.jpg" alt="SVG barrel diagram"><br />
    </object></p>
<p>And here is the 3D version, as a point of reference:</p>
<p><a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/lamb_ray.jpg" rel="lightbox[60]"><img class="aligncenter size-medium wp-image-69" title="LamB (1MPQ)" src="http://blog.pansapiens.com/wp-content/uploads/2008/06/lamb_ray.jpg" alt="generated using PyMol (raytraced)" width="272" height="300" /></a></p>
<p>The 2D vector diagram could do with some work to aid in a more accurate representation (unfortunately &#8216;flat&#8217; views of a 3D barrel always have to make some compromises), but it does the job. The goal was to keep it simple &#8230; simple it is. One day I may extend this code to actually use known structure coordinates to automatically align the strands (saving tedious manual alignment), and write some code that properly lays out the loops.</p>
<p>Anyone know any other programs of similar functionality I&#8217;ve missed ?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/06/26/software-review-producing-two-dimensional-diagrams-of-membrane-proteins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>ResolveRef updated : now with auto-suggest and source code</title>
		<link>http://blog.pansapiens.com/2008/06/06/resolveref-updated-now-with-auto-suggest-and-source-code/</link>
		<comments>http://blog.pansapiens.com/2008/06/06/resolveref-updated-now-with-auto-suggest-and-source-code/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 00:26:37 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[two-point-oh]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[gae]]></category>
		<category><![CDATA[Google App Engine]]></category>
		<category><![CDATA[resolveref]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=56</guid>
		<description><![CDATA[I updated ResolveRef last night and checked in the most current sourcecode to svn at Google Code. New features include: Suggest/autocomplete for journal title field, using the journal title lists provided by PubMed. A &#8220;Verify&#8221; button. Allows a ResolveRef URL to be constructed with the web form and verified as working and valid without actually [...]]]></description>
			<content:encoded><![CDATA[<p>I updated <a href="http://resolveref.appspot.com/">ResolveRef</a> last night and checked in the most current sourcecode to svn <a href="http://code.google.com/p/resolveref/">at Google Code</a>.</p>
<p>New features include:</p>
<p><a href="http://blog.pansapiens.com/wp-content/uploads/2008/06/resolveref1.png" rel="lightbox[56]"><img class="alignright size-medium wp-image-58" title="ResolveRef" src="http://blog.pansapiens.com/wp-content/uploads/2008/06/resolveref1-230x300.png" alt="ResolveRef, now prettier, with comments box by disqus." width="230" height="300" /></a></p>
<ul>
<li>Suggest/autocomplete for journal title field, using the journal title lists provided by PubMed.</li>
<li>A &#8220;Verify&#8221; button. Allows a ResolveRef URL to be constructed with the web form and verified as working and valid without actually forwarding the user to the article.</li>
<li>Some bugfixes (handled the case where there is no DOI in the PubMed record, handled network timeouts to PubMed)</li>
<li>Refreshed visuals</li>
<li>Disqus comments box for feedback</li>
</ul>
<p>In the interest of just getting something working quickly, I implemented the suggest feature in the laziest, possibly most RAM and CPU hungry way possible (the &#8220;JQuery Suggest&#8221; code queries the web app with substrings as you type each character. At the server side, the app uses a regex to scan a ~1.5 Mb list of journal titles held in RAM). I&#8217;ve already noticed a few &#8220;<em>This request used a high amount of CPU</em>&#8221; warnings in the logs, with the threat &#8220;<em>High CPU requests have a small quota, and if you exceed this quota, your app will be temporarily disabled</em>&#8220;. If my nasty hack starts heating up Google&#8217;s datacentre too much, I might have to disable the &#8216;suggest&#8217; feature until I can implement it &#8220;properly&#8221;.</p>
<p><span id="more-56"></span></p>
<h3>Reflections, discoveries</h3>
<p>This idea of implementing Openref-style article identifiers has been an fun experiment, and a nice way to learn more about the ins-and-outs of PubMed. When working on implementing the &#8216;suggest&#8217; feature, a major drawback became even more apparent &#8230; journal titles (the <em>[TA]</em> field) used by PubMed are not always easily guessable, and many common abbreviations used in reference lists do not appear to exist in <a href="http://www.ncbi.nlm.nih.gov/entrez/citmatch_help.html#JournalLists">PubMed&#8217;s downloadable flat-file journal title lists</a>. This is the list that ResolveRef uses to make the &#8216;suggestions&#8217;, so having &#8216;missing&#8217; journal titles presents a problem if I want users to be able to painlessly construct ResolveRef URLs.</p>
<p><em>Proc. Natl. Acad. Sci. U.S.A. </em>is a perfect example. Many article bibliographies use <em>PNAS</em> &#8211; that would be my guess if I were trying to create a ResolveRef URL for a <em>PNAS</em> paper &#8211; and yet this journal title does not exist as far as PubMed&#8217;s official journal list is concerned. Issues surrounding this problem were <a href="http://baoilleach.blogspot.com/2008/01/doi-or-doh-proposal-for-restful-unique.html">discussed on Noel&#8217;s original OpenRef post</a>. The odd thing, is that if I search the <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=journals">PubMed Journals database</a>, for &#8220;PNAS&#8221;, <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=nlmcatalog&amp;doptcmdl=Expanded&amp;cmd=search&amp;Term=7505876[NlmId]">it finds it</a>, and gives me a record where <em>PNAS</em> is listed under &#8220;Other titles(s)&#8221;. If someone could point me to where I can get these extra fields containing additional names for a journal that are not provided in the the downloadable flat-files, it would be much appreciated (I bet Alf knows the answer. Or maybe I should email the folks at PubMed). If I can get a better list of titles the &#8216;suggest&#8217; feature in ResolveRef would suddenly become a whole lot more useful. Another way around this may be to use CrossRef, and I&#8217;m looking int<span style="color: #000000;">o tha</span><span style="color: #000000;">t, </span><span style="color: #000000;"><a href="http://depth-first.com/articles/2008/05/06/hacking-doi-interconvert-bibliographic-references-and-dois-with-crossref-and-openurl">but I get the feeling that usage of the CrossRef API is more restricted</a>, so I haven&#8217;t bothered with it so far.<br />
</span></p>
<h3>Thoughts about the future of ResolveRef / OpenRef</h3>
<p>At this stage, ResolveRef URLs are not actually identifiers. They simply act like a frontend to a single-hit PubMed search, and several <em>different</em> ResolveRef URLs can return the <em>same</em> DOI URL (and hence the same journal article). A proper identifier would have a one-to-one mapping between the human-readable ResolveRef URLs and a DOI. In the future, I may attempt to get ResolveRef to &#8216;normalize&#8217; URLs by allowing only a single journal title for each journal and forcing the use of volume numbers if present. The user could use the web interface to enter the values, and ResolveRef will return a normalized URL. Only normalized URLs would successfully forward to the DOI URL, others will return an error with &#8220;Did you mean ..<em>insert normalized URL ..?</em>&#8220;. One drawback is that this would reduce the guessablity of ResolveRef URLs, but the advantage is that they could be treated like identifiers: one article would have one and only one valid ResolveRef URL. By requiring a tool (like the ResolveRef web form) to help users build a vaild URL, and removing some of the guessability, ResolveRef would move a little closer to a <a href="http://hublog.hubmed.org/archives/001601.html">reinvention of OpenURL</a> (although I think OpenRef/ResolveRef URLs are still more readable and cleaner than OpenURLs, and are much more guessable if you have a bibliography in front of you).</p>
<p>A key cosmetic (and philosophical) difference between OpenURL and OpenRef/ResolveRef URLs is that OpenURL uses HTTP GET fields, eg <em>?title=bla&amp;issn=12345</em>, while OpenRef/ResolveRef uses the URL path itself eg, <em>somejournalname/2008/4/1996</em>. It&#8217;s a bit like one scheme was designed in the age of CGI scripts, while the other was designed for web applications capable of more RESTful behaviour. In my mind OpenURL is more versatile but much uglier, while OpenRef is cleaner and simpler but can only reference journal articles. OpenRef-style URLs will never be able to reference the breadth of resources that an OpenURL can theoretically handle. Maybe hybrid solution could work &#8230; some kind of OpenURL server that could &#8220;speak OpenRef&#8221; &#8230; accepting OpenRef-style URLs where possible, while still dealing with regular OpenURL style &#8220;<em>?bla=blarg&amp;</em>&#8221; query strings for everything else.</p>
<p>As far as I can tell OpenURLs are not <em>identifiers</em> with a one-to-one URL-to-article mapping &#8211; this is a drawback since you could not do a Google search to reliably find sites that reference an article via it&#8217;s OpenURL &#8230; you theoretically could do this with a normalized OpenRef/ResolveRef URL, since there will only be one unique string used to reference any one article (as Noel pointed out, OpenRef strings have some properites akin to InChi strings). Obviously to do this cleanly, ResolveRef would need a nicer domain (something akin to dx.doi.org).</p>
<p>Anyhow, I&#8217;m not expecting ResolveRef / OpenRef to make any impact on anything anywhere anytime soon. I&#8217;m not a librarian, I don&#8217;t sit on an <a href="http://listserv.oclc.org/scripts/wa.exe?A0=OPENURL">NISO/ANSI committee</a>, and I don&#8217;t see publishers seeing a need to adopt anything beyond the DOI. But it&#8217;s been nice to play around with, and I&#8217;m likely to continue doing so.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/06/06/resolveref-updated-now-with-auto-suggest-and-source-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>ResolveRef : looking at the logs</title>
		<link>http://blog.pansapiens.com/2008/06/01/resolveref-looking-at-the-logs/</link>
		<comments>http://blog.pansapiens.com/2008/06/01/resolveref-looking-at-the-logs/#comments</comments>
		<pubDate>Sun, 01 Jun 2008 08:29:24 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[biopython]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[two-point-oh]]></category>
		<category><![CDATA[web2.0]]></category>
		<category><![CDATA[gae]]></category>
		<category><![CDATA[Google App Engine]]></category>
		<category><![CDATA[resolveref]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=54</guid>
		<description><![CDATA[One of the nice features of Google App Engine is you can easily view logs for your application to quickly see requests generating errors. Browsing the logs of ResolveRef, I&#8217;ve been able to identify an few classes of query which for one reason or another, weren&#8217;t working. Firstly, there is the &#8220;just testing and don&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>One of the nice features of <a href="http://code.google.com/appengine/">Google App Engine</a> is you can easily view logs for your application to quickly see requests generating errors. Browsing the logs of <a href="http://resolveref.appspot.com/">ResolveRef</a>, I&#8217;ve been able to identify an few classes of query which for one reason or another, weren&#8217;t working.</p>
<p><span id="more-54"></span></p>
<p>Firstly, there is the &#8220;just testing and don&#8217;t actually have a citation on hand to key-in&#8221; class of users, that tried requests something like:</p>
<blockquote>
<h5><span class="file">/ref/xx/2007//</span></h5>
</blockquote>
<p>Not much sympathy here &#8230; it&#8217;s pretty much like dialing a random phone number and hoping it someone will pick up.</p>
<p>Then there is a class of users who appear to have sensible intentions, but provide incomplete ResolveRef URLs, eg:</p>
<blockquote>
<h5><span class="file">/ref/Organic%20Letters/2000//</span></h5>
</blockquote>
<p>Maybe I poorly described ResolveRef in the initial announcement, maybe the documentation in the &#8220;About&#8221; box on the ResolveRef site is unclear or maybe these users just didn&#8217;t read the docs in the first place. When I described the service as &#8220;A RESTful way to do PubMed searches&#8221;, maybe it would have been more accurate to say &#8220;A simple, RESTful way to resolve a <em><strong>single</strong></em> journal article using only the human-readble citation information&#8221;. ResolveRef does not give a <em>list</em> of results to a PubMed search; it forwards to a <em>single hit</em> (ideally the requested article), or gives an error if it can&#8217;t be found. By the looks of it, many users seem to want to use ResolveRef as a way to retrieve a list of results. While this goes against the original spirit of ResolveRef being a resolver for an [almost] <em>unique identifier</em> for journal articles (akin to <a href="http://baoilleach.blogspot.com/2008/01/doi-or-doh-proposal-for-restful-unique.html">Noel&#8217;s OpenRef proposa</a>l), I may be tempted to update ResolveRef to return a list of hits in the future (or just forward to the <a href="http://hubmed.org">HubMed</a> or PubMed results page).</p>
<p>There are also some <em>actual</em> bugs which throw nasty python backtraces (I think this one was actually me trying to use ResolveRef to look up a reference at work ):</p>
<blockquote><p><strong><br />
/ref/Protein%20Sci/1999/8/689</strong></p></blockquote>
<p>This threw an error since ResolveRef (stupidly) assumed that every PubMed record has an associated DOI &#8230; however for some reason this Protein Science article does not have a DOI recorded in PubMed, so it fails to resolve with ResolveRef. This is (yet another) drawback to using PubMed as a backend. I&#8217;m thinking I may need to make ResolveRef <a href="http://depth-first.com/articles/2008/05/06/hacking-doi-interconvert-bibliographic-references-and-dois-with-crossref-and-openurl">interface with CrossRef</a> somehow too, since that may act as a backup (or complete replacement) for these cases.</p>
<p>There also seem to be occasional errors generated when the HTTP connection from the Google App servers to PubMed fails; my fault entirely &#8230; that type of exception should always be anticipated and caught in a networked application.</p>
<p>Apart from guessing how people may like to use the application by examining the logs, <span class="gray"><a href="http://appgallery.appspot.com/about_app?app_id=agphcHBnYWxsZXJ5chMLEgxBcHBsaWNhdGlvbnMYnAcM"><em>edoardo.marcora</em> also suggested that autocomplete/suggest</a> for the journal field would be nice. I agree &#8230; this was a feature I was working on prior to the initial release, but it was taking too long so I just launched ResolveRef without it.</span></p>
<p>There is a new version in the pipeline, and will be ready for release soon. I&#8217;ll also put it on Google Code, warts and all. I already have the &#8220;suggest&#8221; functionality working, and once I resolve the few bugs discussed above, I&#8217;ll push out an update. Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/06/01/resolveref-looking-at-the-logs/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
		<item>
		<title>texshade: useful, and still kickin&#8217;</title>
		<link>http://blog.pansapiens.com/2008/04/29/texshade-useful-and-still-kickin/</link>
		<comments>http://blog.pansapiens.com/2008/04/29/texshade-useful-and-still-kickin/#comments</comments>
		<pubDate>Tue, 29 Apr 2008 08:53:52 +0000</pubDate>
		<dc:creator>Andrew Perry</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://blog.pansapiens.com/?p=50</guid>
		<description><![CDATA[I&#8217;ve been looking at doing an analysis with some protein subfamily sequence logos, using Eric Beitz&#8217;s texshade. While it&#8217;s a little strange that it does the actual analysis part (rather than just the rendering) using LaTeX, it&#8217;s the only implementation of the method I know of, and it beats reimplementing it from the paper. Although [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been looking at doing an analysis with some protein subfamily sequence logos, using <a href="http://www.ctan.org/tex-archive/help/Catalogue/entries/texshade.html">Eric Beitz&#8217;s texshade</a>. While it&#8217;s a little strange that it does the actual analysis part (rather than just the rendering) using LaTeX, it&#8217;s the only implementation of the method I know of, and it beats reimplementing it from <a href="http://www.biomedcentral.com/1471-2105/7/313">the paper</a>.</p>
<p>Although it was published in 2006 (<a href="http://www.ncbi.nlm.nih.gov/pubmed/10842735">and earlier in 2000</a>), with the original URLs now dead, I noticed the latest update for <a href="http://www.ctan.org/tex-archive/help/Catalogue/entries/texshade.html">the version of texshade in CTAN</a> (v1.18) was on 15th of April, 2008 &#8230; ie texshade was updated just 14 days ago !</p>
<p>It happens all to often that published bioinformatics tools cease to be updated or even disappear from the Web not long after the peer-review publication is released. Kudos to Eric for not abandoning his software.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.pansapiens.com/2008/04/29/texshade-useful-and-still-kickin/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<creativeCommons:license>http://creativecommons.org/publicdomain/zero/1.0/</creativeCommons:license>
	</item>
	</channel>
</rss>
