Cloud Computing, Programming December 15th, 2009
Google’s App Engine 1.3.0 was released yesterday along with a brand new Blobstore API allowing the storage and serving of files up to 50MB.
Store and Serve – Files can be uploaded and stored as blobs, to be served later in response to user requests. Developers can build their own organizational structures and access controls on top of blobs.
The way this API works is pretty simple. To upload files you can an API that manufactures a POST URL that web forms requests containing files data are submitted to. App Engine processes the POST request and created the blobs in its storage (and BlobInfo objects – readonly datastore entities containing the metadata on each blob). It then rewrites the request, removing the uploaded files data and replacing them a Blobstore key pointing to the stored blob in the App Engine Blobstore, and calls your handler with this data.
To serve an existing blob in your app, you put a special header in the response containing the blob key. App Engine replaces the body of the response with the content of the blob.
Now this is pretty straightforward but there are few concerns with this approach:
1. What about request validation (authentication\authorization etc.)?
When uploading files, the request reaches your code only after blobs have already been processed and stored. This means that you can only handle authentication\authorization or even form validation after data has been stored.
This means you’ll have to write code to clean the relevant blob entries in case of failed authentication\authorization\validation – more datastore API calls, more CPU…
It also means that without taking care of these special cases any newbie hacker with a simple snifter (or FireBug)
can start uploading (and potentially) serving files off your service (see update).
2. No way to preprocess data
As the files data is already stored prior to the program’s handler being called, there’s no way to preprocess submitted data other than reading it from the store, processing it and storing it again.
There’s also no straightforward API to access or store blob data in code, so the above process has to be implementing using URL fetching (fetch the image via http call, process it, store it again using http POST call)
There must be a way for the Google App Engine team to wrap this app nicely and provide a clean API for this to be done efficiently (along with solving the validation problem described before)
As the Blogstore API is still in experimental phase I guess we’ll see some quick progress made on its development and hopefully the Google team will solve the issues above.
Atleast now there’s a beginning of an alternative to Amazon S3 for AppEngine applications.
Bret Slatkin notes that when the API manufactures the POST URL to be used for uploading the files, it creates a unique one-time URL which which mitigates any potential sniffing.
This fits perfectly for the scenario when you’re rendering a web form to be submitted by the user. But, it makes things harder if you’re trying to provide a REST API that allows uploading files (think of something like TwitPic for example). In this case you’ll have to write your own render that simulates what a web form would do (get the files, create random POST URL, call it, …)