« The Threat of Flying MonkeysComment Withheld »

The Ghost Blogs of Tibet

Jenny, 22 claims to lead an angst-filled life in the Arctic Ocean (80.46 N, 36.87 E), but she really lives near Winston-Salem (36.87 N, 80.46 W).

Musically inclined shepherds in the desolate highlands of Kachi, China, might be extremely excited to learn that it's only a short camel trek to Jazz Guitar Resources, conveniently located near the Kyrgyz border, right next to the Kwik-E-Mart. (40.5 N, 76.5 E). How sad they will be to find that JGR is really somewhere near Harrisburg, Pennsylvania (40.5 N, 76.5 W).

In fact, a look at the map on the GeoURL homepage reveals a dense packing of phantom American weblogs out in Tibet and the Central Asian steppes, a region not known for its lively expatriate community or access to broadband.

Latitude/longitude coordinates form a beautiful synergy with Murphy's Law. In the GeoURL scheme, for example, there are seven ways to get your coordinates wrong, yet still have them be valid. You can give the wrong sign for latitude and longitude, or list the coordinates in the wrong order, or do some of both. The sign problem is especially subtle - latitude and longitude are often shown using unsigned numbers and the letters N, S, E, W to indicate hemisphere, so it's very easy to forget to add a minus sign if you live below the equator, or in the New World.

Plot of USA on world map with reversed longitude coordinates

Some of the errors are easy to spot. In a previous post, I described a suspicious cluster of blogs in the Horn of Africa, many of them in the Indian Ocean, which turned out to be a latitude/longitude transposition error made by some German and Czech bloggers. Finding a cluster of red dots in water was an easy tip-off. The inverted USA in Central Asia is similarly obvious, partly because the area in question is so sparsely inhabited, and partly because the reversed blogs actually form a mirror image of the U.S. East Coast.

But other cases are not so easy to disambiguate. A single pair of coordinates can give locations in Sardinia, Portugal, Tanzania, or Brazil, depending on how they are arranged. Even a language algorithm (which has its own assumptions and potential for error) won't help distinguish a Portuguese blog from a Brazilian one. European or African blogs close enough to the prime meridian can flip longitude with no real chance of detection.

But enough talk. Let's get to the bullet points.

Here's what I've learned this weekend about geographical markup:

  • The standard latitude/longitude system used in META tags invites mistakes, because transposition and sign errors still generate valid coordinates. If you ask users to generate their own metadata, it's important to use a foolproof format.
  • Any kind of visual feedback on the GeoURL site would help reduce errors. Showing a globe with the selected blog displayed as a red dot, for example, would let users do a sanity check on their page and make sure they got their META tag right.
  • Central Asia is ENORMOUS.
  • Most of the mistakes on the GeoURL site are due to one website - DeviantART.com - which seems to provide a lot of bum data. This suggests that a small amount of effort directed at the main offender could go a long way to improving the quality of GeoURL data.
  • Finding an algorithmic way of flagging incorrect coordinates is orders of magnitude harder than getting users to get their numbers right.
  • Each bum blog in the data set degrades the value of the entire data collection.
  • It's really hard to find a tool for doing something like plotting a map of the US with reversed longitude. I finally ended up using Photoshop and the incredible, amazing, unbelievably cool Earth Viewer at Fourmilab. I can't go on enough about the Earth Viewer - be sure to try all the various options, and prepare to be amazed.
  • There are some artefacts on the GeoURL map that are not easy to explain in terms of transposition errors. For example, note the diagonal line running from southeast to northwest, approximating the line where latitude is equal to longitude.

Now for the awards portion of the post:

Starc.deviantart.com wins the Dick Cheney Prize for Most Undisclosed Location - the META tag puts Starc somewhere in Chad, the first user profile on the page says Starc is in Canada, and the second user profile says that Starc lives in Texas.

DeviantART.com itself wins the overall Most Useless Geodata award, for putting half its American users in Tibet and Central Asia.

Fourmilab wins the I Can't Believe This is Free award for best geographical website.

Further honors will go to anyone who can write in with more interesting examples of transpositions in the GeoURL data set, or point me to tools that can easily generate inverted-longitude and -latititude maps for countries and regions, to aid in the hunt for phantom blogs.

Finally, I should make it clear that I don't mean this post to heap dirt on the GeoURL project. This kind of stuff happens wherever there are many users and a potential for error. GeoURL has a large enough data set to make these patterns visible, and maps are something we can all understand. But the deeper point is that we are all fallen in the eyes of the metadata god.

« The Threat of Flying MonkeysComment Withheld »

Greatest Hits

The Alameda-Weehawken Burrito Tunnel
The story of America's most awesome infrastructure project.

Argentina on Two Steaks A Day
Eating the happiest cows in the world

Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?

No Evidence of Disease
A cancer story with an unfortunate complication.

Controlled Tango Into Terrain
Trying to learn how to dance in Argentina

Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting

Attacked By Thugs
Warsaw police hijinks

Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews

A Rocket To Nowhere
A Space Shuttle rant

Best Practices For Time Travelers
The story of John Titor, visitor from the future

100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law

Every Damn Thing

2020 Mar Apr Jun Aug Sep Oct
2019 May Jun Jul Aug Dec
2018 Oct Nov Dec
2017 Feb Sep
2016 May Oct
2015 May Jul Nov
2014 Jul Aug
2013 Feb Dec
2012 Feb Sep Nov Dec
2011 Aug
2010 Mar May Jun Jul
2009 Jan Feb Mar Apr May Jun Jul Aug Sep
2008 Jan Apr May Aug Nov
2007 Jan Mar Apr May Jul Dec
2006 Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2005 Jan Feb Mar Apr Jul Aug Sep Oct Nov Dec
2004 Jan Feb Mar Apr May Jun Jul Aug Oct Nov Dec
2003 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002 May Jun Jul Aug Sep Oct Nov Dec

Your Host

Maciej Cegłowski


Please ask permission before reprinting full-text posts or I will crush you.