« Elevators I Have Known | Bosphorous » |
Using Google App Engine With Amazon Web Services
(My non-technical readers should pull the ripcord here)
Sometimes it can be handy to duct-tape the Google Application Engine to other web tools, such as Amazon's S3 (storage) or SQS (message queue) service.
For example, I have been building a little search engine that can import and index a complete list of bookmarks from a del.icio.us account. Depending on how many bookmarks the account contains, this import can take a few dozens of seconds.
Since GAE doesn't allow you to run background tasks, the browser will hang while this import runs. If it takes too much time (which can happen for large collections of bookmarks), the import risks being killed by the GAE hosting environment. But even if it finishes before being killed, the user is stuck looking at what appears to be an unresponsive browser page for however long it takes to complete.
To avoid this problem, I have rigged the upload form handler in my app to store the user's uploaded bookmarks file to an S3 account and then put a message on an SQS queue. A faraway worker process (living in a cloud on an EC2 server) polls this queue, dutifully retrieves the file, does its indexing magic and then uploads the bookmarks into the user's account using GAE's bulk loading API. While this is happening in the background, the user can continue to interact with the web application as usual. After a few seconds, imported bookmarks begin to appear in his account, and within a few minutes the account is up to date.
This workers + queue setup is a very common way of handling asynchronous tasks in web apps, but setting up communication between GAE and Amazon web services can be tricky due to security restrictions in Google's Python runtime. In particular, any Python module that wraps a socket, including urllib
, is disallowed. GAE instead requires that you use its custom URL loader. This means that the standard SQS and S3 python modules provided by Amazon won't work without some modifications.
I've put together versions of both modules that are usable from within GAE.
The module for talking to S3 is a simple patch of Amazon's boilerplate module to use GAE's URL fetcher instead of urllib
. The SQSUrlBuilder
module is a factory for generating properly signed queue-manipulation URLs.
« Elevators I Have Known | Bosphorous » |
brevity is for the weak
Greatest Hits
The Alameda-Weehawken Burrito TunnelThe story of America's most awesome infrastructure project.
Argentina on Two Steaks A Day
Eating the happiest cows in the world
Scott and Scurvy
Why did 19th century explorers forget the simple cure for scurvy?
No Evidence of Disease
A cancer story with an unfortunate complication.
Controlled Tango Into Terrain
Trying to learn how to dance in Argentina
Dabblers and Blowhards
Calling out Paul Graham for a silly essay about painting
Attacked By Thugs
Warsaw police hijinks
Dating Without Kundera
Practical alternatives to the Slavic Dave Matthews
A Rocket To Nowhere
A Space Shuttle rant
Best Practices For Time Travelers
The story of John Titor, visitor from the future
100 Years Of Turbulence
The Wright Brothers and the harmful effects of patent law
Every Damn Thing
Your Host
Maciej Cegłowski
maciej @ ceglowski.com
Threat
Please ask permission before reprinting full-text posts or I will crush you.