I was on a conference call yesterday and the topic of ways to store persistent data when using Amazon EC2 came up a couple of times. It would be really cool to have a persistent instance of a relational database like MySQL but there's nothing like that around at the moment. An instance can have a copy of MySQL installed and can store as much data as it would like (subject to the 160GB size limit for the virtual disk drive) but there's no way to ensure that the data is backed up in case the instance terminates without warning.
Or is there?
It is fairly easy to configure multiple instances of MySQL in a number of master-slave, master-master, and other topologies. The master instances produce a transaction log each time a change is made to a database record. The slaves or co-masters keep an open connection to the master, reading the changes as they are logged and mimicing the change on the local copy. There can be some replication delay for various reasons, but the slaves have all of the information needed to maintain exact copies of the database tables on the master.
Put another way, the master essentially implements a simple service API for fetching changes as they occur and the slaves slavishly do those same changes.
Hmmm...services...
What if the slave (client) wasn't another instance of MySQL? What if it was a very simple application which pulled down the transaction logs and wrote them into Amazon S3 objects on a frequent and regular basis? If the master were to disappear without warning (I could say crash here, but I won't), the information needed to restore the database to an earlier state would be safely squirreled away in S3.
For recovery, we need another service. This one pretends to be a master, but it simply pulls out that squirreled-away cache of transactions logs from S3 and feeds them to a MySQL instance which it is temporarily slaved to. After a replay of all of the transactions the slave becomes the master and processing resumes.
Make sense? Could this work? What do you think, Brian?
I'd better run, or I'll be late for my talk!
-- Jeff;


There are 2 different considerations here, aren't there?
1. How do I ensure reliability so that my application can continue to see the database in the event of an instance failing?
2. How do I ensure I lose no data in the event of a failure.
I can see how this solves (2), but not (1)...
Posted by: Tom Hume | September 22, 2006 at 07:07 AM
I am wondering whether we could use one of MySQL pluggable Storage Engines (Table Types) "S3 Storage Engine" as another alternative - http://dev.mysql.com/tech-resources/articles/mysql_5.0_psea1.html?
Any MySQL experts can validate this?
Posted by: Jinesh | September 22, 2006 at 10:13 AM
As I've mentioned several times on the EC2 forum, writing transaction logs to S3 isn't good enough given the unpredictable propagation time of S3. When I start up a new database instance on EC2 and read the transaction logs, I want to know that I have _all_ of the transaction logs and that none of them are stuck in S3 waiting to propagate.
As I've also mentioned on the forum, if EC2 provided a "remember this sequence number for me" service, this problem could be avoided by persistently storing the number of the last transaction logged so that a new instance would know whether all the data had arrived through S3 yet.
Posted by: Colin Percival | September 22, 2006 at 03:20 PM
Yes, by and large you are correct. But the devil is in the details, many of them!
For example, MyISAM tables do not replicate correctly because the replication (binary) log does not obey transactions. You can have a transaction that rolls back on the master but it's in the log and happily executed on the slave. Ouch!
Even if you use InnoDB tables, you are not safe. For example, you can write non-deterministic SQL statements that may produce different results on the slave than on the master. An example is creating a table with an auto-increment key using a select from another table. The keys assigned depend on the order in which the select produces results. This may be different on the slave than the master and you end up with different keys, which is not going to match subsequent operations! (A friend ran into this one, one slave matched the master while two others became inconsistent quickly, he spent hours and hours figuring out what went wrong!)
You also assume multiple EC2 instances in your description, but there is no control over their location. They may well be on the same power circuit that goes out. What about the S3 node? I forget the semantic details: when S3 ACKs the store request, does it guarantee that the data is replicated already or does it only guarantee that it is on persistent storage? In other words, could it be on a machine in the same datacenter on the same UPS that goes down with the EC2 mysql box and doesn't come back up until the UPS is repaired a few hours later? In that case, yes, the data would be safe on S3, but unavailable for a few hours. That wouldn't help me in quicly restoring from S3 onto a new mysql instance, would it?
Someone else also mentioned restore time on the EC2 forum. How long would it take to restore a 70GB database? (70GB is probably the max you can put on an EC2 instance if you ever want to be able to make a backup copy without jumping through hoops.) My guess is at least 10-15 minutes, and that doesn't count the time to apply the logs.
Oh, talking about logs, when do you start your logs? You need a clean full backup and then you can start the (incremental) replication logs. After a while, you really should start afresh so you don't collect endless logs that would take forever to apply. If your app is not 24/7 it's easy, but if it's 24x7 you need to do what's called a hot-backup. Ahhh, no such support in mysql/innodb unless you pay for it.
And performance? Did you know that a mysql slave can easily be slower than the master? The reason is that the slave reads and applies the replication log using a single thread. So while your app server bangs on your master with high concurrency, your slave performs one operation at a time. Usually this is not an issue because the master has the additional load of reads, but the problem does exist.
The bottom line is that in theory everything is available to set-up a nice mysql installation on EC2/S3, but in practice it's far from easy to actually pull it off in a reliable manner.
[Sorry for the long comment]
Posted by: Thorsten | September 22, 2006 at 10:27 PM
Also, MySQl replication is statement based, not data based. Thus you can't use the binary logs to back up to S3. To use S3 for back up, one one would have to periodically run queries to take diffs from your db and put that data into S3.
So consider something like the following:
1. Beginning of every week, take a full dump of your db and put that data into S3.
2. From then on, with whatever frequency you consider appropriate for your app, run queries to capture deltas and put those into S3.
This way it is possible to use S3 as a backup for you datastore. However, that is very distinct from a MySQL slave. A slave is meant to serve as a hot swap in. So if your master goes down, you should be able to switch to the slave.
None of this ofcourse guarentees 0 data loss, since there is a delay. Still it may work for certain types of applications.
Posted by: Tushar Jain | September 23, 2006 at 08:30 AM
Jinesh: That pretty much strikes me as the neatest solution. I have no idea if/now it would work in practice, though.
Basically, the overall problem is one thing: the application framework I use (Ruby on Rails) talks to an SQL backend. That's pretty well ingrained into it and so I want an SQL backend to talk to. Really, I don't care how it's implemented, and I don't want to have to worry about the fiddly details, and I don't want to go to the effort of *verifying* that it does in fact work.
Maybe there's a market there for somebody to take Jeff's suggestion, run with it and sell the resulting service to the rest of us who don't want to worry about the details?
Posted by: mathie | September 25, 2006 at 08:33 AM
A contributor named Navin sent me the following:
This would mostly work, except that there will be a lag (however small) between the time a transaction is committed into the MySQL instance, and it is copied by the slave "service'. The lag occurs because the MySQL instance is not aware of the slave, and so does not wait for it before returning from its commit. The above lag is not a problem if one is willing to lose some transactions (imagine the MySQL instance becoming unavailable after writing the last few records that have still not been copied by the slave), but certainly not acceptable where one is not allowed to lose any data (imagine if the last transaction was a credit of $1M to your account, and it is not there the next day!).
thanks
Navin
Posted by: Jeff Barr | September 25, 2006 at 09:31 AM
The "real" solution seems very simple to me, EC2 should not throw away the hard drive instance when the server crashes, it should reboot the machine. It should also be possible to "mount" that drive in another EC2 should the instance be permanently corrupted and you need to pull the data off the original instance. This gives you the same level of uptime you have in any other data center, and if you can do it faster due to web services and faster up/down speed of bringing up new instances all the better. I think amazon found out that the biggest market for EC2 isn't in giant clusters doing mathematical calculations (although I do have a few ideas for using it for this) but instead in a redundant high speed, high bandwidth hosting service.
Posted by: Robert Hostetter | September 25, 2006 at 09:37 AM
Thorsten's idea is similar to what Amanda developers have been working on. Our idea to use S3 as one of the backup media supported by Amanda. With S3 device plugin (http://wiki.zmanda.com/index.php/Device_API), Amanda can do full and incremental backups of
MySQL to S3.
Posted by: Paddy Sreenivasan | October 03, 2006 at 10:47 AM