13.08.2008

Using Google App Engine With Amazon Web Services

(My non-technical readers should pull the ripcord here)

Sometimes it can be handy to duct-tape the Google Application Engine to other web tools, such as Amazon's S3 (storage) or SQS (message queue) service.

For example, I have been building a little search engine that can import and index a complete list of bookmarks from a del.icio.us account. Depending on how many bookmarks the account contains, this import can take a few dozens of seconds.

Since GAE doesn't allow you to run background tasks, the browser will hang while this import runs. If it takes too much time (which can happen for large collections of bookmarks), the import risks being killed by the GAE hosting environment. But even if it finishes before being killed, the user is stuck looking at what appears to be an unresponsive browser page for however long it takes to complete.

To avoid this problem, I have rigged the upload form handler in my app to store the user's uploaded bookmarks file to an S3 account and then put a message on an SQS queue. A faraway worker process (living in a cloud on an EC2 server) polls this queue, dutifully retrieves the file, does its indexing magic and then uploads the bookmarks into the user's account using GAE's bulk loading API. While this is happening in the background, the user can continue to interact with the web application as usual. After a few seconds, imported bookmarks begin to appear in his account, and within a few minutes the account is up to date.

This workers + queue setup is a very common way of handling asynchronous tasks in web apps, but setting up communication between GAE and Amazon web services can be tricky due to security restrictions in Google's Python runtime. In particular, any Python module that wraps a socket, including urllib, is disallowed. GAE instead requires that you use its custom URL loader. This means that the standard SQS and S3 python modules provided by Amazon won't work without some modifications.

I've put together versions of both modules that are usable from within GAE. The module for talking to S3 is a simple patch of Amazon's boilerplate module to use GAE's URL fetcher instead of urllib. The SQSUrlBuilder module is a factory for generating properly signed queue-manipulation URLs.

Idle Words

brevity is for the weak






Frequent Topics

China (10)
Argentina (9)
Space and Aviation
Canada
Vermont


Greatest Hits

Argentina on Two Steaks A Day
Eating the happiest cows in the world

Dabblers and Blowhards
Smushing Paul Graham

Attacked By Thugs
Warsaw police hijinks

Dating Without Kundera
Alternatives to the Slavic Dave Matthews

A Morning in Iceland
The best layover in the world

A Rocket To Nowhere
Space Shuttle rant

Best Practices For Time Travelers
The story of John Titor

French Week 3/03 (Parts 1 2 3 4 5 6)

100 Years Of Turbulence
Wright Brothers exposed

NYC Marathon 2003
Bleeding nipples

PC Forum
Business at the speed of dumb

Poland Joins The EU
Report from Warsaw


Every Damn Thing


2008 Jan Feb Mar
2007 Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
2006 Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
2005 Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
2004 Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
2003 Jan Feb Mar Apr May Jun
Jul Aug Sep Oct Nov Dec
2002 May Jun Jul Aug Sep Oct
Nov Dec

Your Host

Maciej Ceglowski



Less Idle

Mimi Smartypants
The best blogger

Jeweled Platypus
Britta gives me hope

A Shout Out To My Pepys
Ignatz is a writing hero

Scrubbles
Posters, books, design, bric-a-brac. Smart writing

Duck For Cover
Marrije reads so you don't have to

Language Hat
Always interesting language geekery

Eyeteeth
Eyeteeth is bound for writing glory



Threat

Please ask permission before reprinting full-text posts or I will crush you