January 4, 2007
I’d like to build a web-scale database infrastructure layered on top of Amazon’s EC2 and S3. EC2 provides “unlimited” compute power and S3 “unlimited” storage space. Together this can be used to offer “unlimited” database storage and processing using a web query language. The reason for a DB is that most Web 2.0 sites are just scripts running on top of a DB. Startups still have to buy racks of machines to maintain their DBs, which is expensive and annoying. If they could just reuse an existing DB running on Amazon’s machines, it would be vastly easier to deploy web applications. For example, del.icio.us is a dead simple app that stores your bookmarks in a DB. Even Gmail is just storing all your email in a giant DB. Rather than be limited by these applications, you can write more powerful apps by having direct access to your data via SQL or XQuery. You can upload additional processing code into EC2 to manipulate the results directly. I wonder if a relational model is appropriate for a web database, or perhaps something simpler might be more scalable?