« Elevators I Have KnownBosphorous »

Using Google App Engine With Amazon Web Services

(My non-technical readers should pull the ripcord here)

Sometimes it can be handy to duct-tape the Google Application Engine to other web tools, such as Amazon's S3 (storage) or SQS (message queue) service.

For example, I have been building a little search engine that can import and index a complete list of bookmarks from a del.icio.us account. Depending on how many bookmarks the account contains, this import can take a few dozens of seconds.

Since GAE doesn't allow you to run background tasks, the browser will hang while this import runs. If it takes too much time (which can happen for large collections of bookmarks), the import risks being killed by the GAE hosting environment. But even if it finishes before being killed, the user is stuck looking at what appears to be an unresponsive browser page for however long it takes to complete.

To avoid this problem, I have rigged the upload form handler in my app to store the user's uploaded bookmarks file to an S3 account and then put a message on an SQS queue. A faraway worker process (living in a cloud on an EC2 server) polls this queue, dutifully retrieves the file, does its indexing magic and then uploads the bookmarks into the user's account using GAE's bulk loading API. While this is happening in the background, the user can continue to interact with the web application as usual. After a few seconds, imported bookmarks begin to appear in his account, and within a few minutes the account is up to date.

This workers + queue setup is a very common way of handling asynchronous tasks in web apps, but setting up communication between GAE and Amazon web services can be tricky due to security restrictions in Google's Python runtime. In particular, any Python module that wraps a socket, including urllib, is disallowed. GAE instead requires that you use its custom URL loader. This means that the standard SQS and S3 python modules provided by Amazon won't work without some modifications.

I've put together versions of both modules that are usable from within GAE. The module for talking to S3 is a simple patch of Amazon's boilerplate module to use GAE's URL fetcher instead of urllib. The SQSUrlBuilder module is a factory for generating properly signed queue-manipulation URLs.

« Elevators I Have KnownBosphorous »

Greatest Hits

The Alameda-Weehawken Burrito Tunnel
The story of America's most awesome infrastructure project.

Argentina on Two Steaks A Day
Eating the happiest cows in the world

Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?

No Evidence of Disease
A cancer story with an unfortunate complication.

Controlled Tango Into Terrain
Trying to learn how to dance in Argentina

Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting

Attacked By Thugs
Warsaw police hijinks

Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews

A Rocket To Nowhere
A Space Shuttle rant

Best Practices For Time Travelers
The story of John Titor, visitor from the future

100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law

Every Damn Thing

2020 Mar Apr Jun Aug Sep Oct
2019 May Jun Jul Aug Dec
2018 Oct Nov Dec
2017 Feb Sep
2016 May Oct
2015 May Jul Nov
2014 Jul Aug
2013 Feb Dec
2012 Feb Sep Nov Dec
2011 Aug
2010 Mar May Jun Jul
2009 Jan Feb Mar Apr May Jun Jul Aug Sep
2008 Jan Apr May Aug Nov
2007 Jan Mar Apr May Jul Dec
2006 Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2005 Jan Feb Mar Apr Jul Aug Sep Oct Nov Dec
2004 Jan Feb Mar Apr May Jun Jul Aug Oct Nov Dec
2003 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002 May Jun Jul Aug Sep Oct Nov Dec

Your Host

Maciej Cegłowski


Please ask permission before reprinting full-text posts or I will crush you.