Delicious geohashes … mmmm … tagging *drool*

Since I got a new toy for Christmas, I’ve become interested in geolocation and the fun things you can do when you have an internet-connected GPS-enabled device in your pocket. I’m also a compulsive delicious tagger, so I quickly discovered the existing practice for geotagging delicious bookmarks.

Essentially, this seems to be: add the tag ‘geotagged‘, along with the tags ‘geo:lat=X.xxx‘ and ‘geo:lon=X.xxx‘, where the X.xxx‘s are the latitude and longtitude numbers that are likely to come straight out of your GPS, in decimal degrees (WGS84).

This is all very nice, but the problem with tags in this format is that there is no easy or efficient way to use them to retrieve all items tagged for a particular locality. Sure, if I’m standing right on top of the Eureka Tower at -37.821362,144.964213, I can search for tags geo:lat=-37.821362 and geo:lon=144.964213 to find all the geotagged links for that exact location, but what if I’m standing 50 metres across the street looking up at the tower and want to search for links near my current location ?

Enter the geohash, a hash function for geolocation coordinates invented by Gustavo Niemeyer (not to be confused with the xkcd Spontaneous Adventure Generation algorithm of the same name). Wikipedia gives a reasonable explanation of how geohashes work … essentially the latitude and longitude are encoded as strings like r1r0fdzdwg. Geohashes have the useful property of having arbitrary precision … geohashes with the same prefix represent locations in the same vicinity. This means that the location across the street from the Eureka tower, at geohash r1r0fdy7sm, shares the prefix r1r0fd with the geohash closest to the top of the Eureka Tower, at r1r0fdzdwg.

My proposal for delicious geotaggers is that in addition to the geo:lat and geo:lon tags, several truncated geo:hash tags should also be used. If I were to bookmark something related to the Eureka Tower, I may tag it:

geotagged
geo:lat=-37.821362
geo:lon=144.964213
geo:hash=r1r0fdzdwg
geo:hash=r1r0fdz
geo:hash=r1r0f

Then, anyone searching for the tag geo:hash=r1r0f will find every item within the area that this geohash covers … this would include not only the Eureka Tower, but the Rialto Towers too.

For each bookmarked item, the number of truncated geohashes used as tags roughly determine the distance ranges (ie bounding boxes) that can be searched. Exactly which truncations, or how many geohash tags to use, is an existing problem that I haven’t yet decided the best solution for; is it best to ‘overload’ with every possible geohash truncation (eg include tags geo:hash=r1r0fdzdwg, geo:hash=r1r0fdzdwg, geo:hash=r1r0fdzdw, geo:hash=r1r0fdzd, geo:hash=r1r0fdz …etc… to geo:hash=r) ? This is probably overkill. A better approach would be to choose just a few key truncations that roughly correlate to a range of sensibly sized patches on the Earths surface, eg, bounding boxes with diagonal lengths of:

  • geo:hash=r1r0fdzdwg ~60 cm [‘exact’]
  • geo:hash=r1r0fdzd ~20 m
  • geo:hash=r1r0fdz ~150 m
  • geo:hash=r1r0fd ~600 m
  • geo:hash=r1r0f ~4.8 km
  • geo:hash=r1r0 ~19.5 km
  • geo:hash=r1r ~150 km

These ranges map loosely to those deemed useful by Brightkite, which lets you search for events around you within 20 m, 200 m, 2 km, 4 km, 10 km, 50 km and 100 km. Maybe we only need a few of these. If only a few truncations were provided by the tagger, the user can always execute multiple searches, starting from the full geohash of their current location and truncating back, character by character, (effectively expanding the search radius) until they start to get hits. There may also be techniques whereby the last character(s) of the truncated hash can be incremented/decremented to search neighboring bounding boxes (eg for r1r0fd, also search for r1r0fc, r1r0fe tags), although I need to think about this a little more.

Of course, the best solution for more useful geotagging within delicious would be for delicious/Yahoo to explicitly support some style of geotagging and provide a geotag-aware search facility … but until that day, geohashes may well do the job well enough. Next step for me: write a proof of concept application that actually produces and makes use of these types of tags ….

Leave a Reply

Your email address will not be published. Required fields are marked *