DBs on Amazon EC2 + S3
July 10, 2007
Dare Obasanjo is doing what my lazy ass can’t do, he’s developing a Facebook app just to see what the architectural hurdles are for large scale web sites. In this post, he claims that Amazon’s AWS won’t work because it doesn’t provide persistent storage. If your virtual instance containing your DB goes down, you lose all your data unless you explicitly back it up in S3. Remember, the hard drive in EC2 is virtual and transitory. This seems like a problem that must have a reasonable solution.
On the Amazon Web Service blog, Jeff suggests having one DB instance store incremental backups into S3. For recovery, a DB instance rolls up all the incremental logs into a new snapshot of the DB. I’d suggest running this recovery instance frequently so you regularly have an up-to-date DB snapshot. That way you can launch more instances using this fresh data, the recovery time will be much quicker, and you don’t end up with terabytes of incremental logs in S3. Most of the complaints in the comments are easy to solve. There are some issues which are MySQL problems (transaction logs?), which would trip you up on your own server farm.
It seems like people are trying to get existing software to work on AWS without much fiddling. But AWS is significantly different from a conventional server farm, thus requiring different solutions. Most of the startups (even big ones) presenting at New York’s Tech Meetup are using AWS and swear it’s the greatest thing ever. And I think people should consider combining AWS with their own hardware. For example, run the masters on your own server farm, but run the slaves and caches in AWS. I finally got into the beta program for AWS, so I hope to try some of this out someday.