Earlier this week I was trying (and failing miserably) to explain an idea to a co-worker. I promised to clear up my thoughts and to encapsulate them in a blog post, so here goes.
In a nutshell, if you take a peek behind the scenes at a web site, you will find a highly configurable (and with any luck, very highly tuned) set of services -- application servers, database servers, queues, and so forth. In my experience, tuning even a single service to behave well under a particular load can be very difficult and time consuming. All too often the temptation is simply to add hardware, when in fact this will simply spread the inefficiency to even more locations. On the other hand, finding the proper combination of configuration values can seem like a never-ending pursuit of a mythical sea creature.
Consider a database server such as MySQL and its associated configuration file, my.cnf. This file contains dozens of tunable parameters, many of which affect how much memory is allocated and how that memory is used. Examples of such parameters include the sort_buffer_size, join_buffer_size, and query_cache_size. Given the finite amount of RAM available on the system, it is simply not reasonable to set each of these parameters to overly generous values since there are multiplicative effects which would raise the overall amount of RAM consumed by MySQL to an impractically large value. There are also interaction effects, where raising one value makes one function more efficient while slowing down others.
Let's call the set of parameters that are of concern the parameter space. Perhaps we want to explore what happens as the sort_buffer_size varies from 1 MB up to 16 MB, while also varying (in all possible combinations) the join_buffer_size from 1 MB up to 32 MB. This is a two dimensional space, but it could have any number of dimension from 1 on up.
Although there are a number of excellent guides to MySQL optimization, getting it right is still a big job.
I would like to propose the use of a structured benchmarking system built around Amazon EC2 to help developers measure and optimize complex servers similar to the one I've described above. Let's start with a simple setup using just three instances:
- The first instance is a controller or test harness. It requires network access to the other two instances. While iterating over the parameter space, the controller repeatedly sets up the server under test, fires off a simulated load, and collects the results.
- The second instance is the test subject server. It would hold the service to be tested and optimized, and would also incorporate a simple web service (called by the controller) with the power to set server parameters (e.g. MySQL's sort_buffer_size) and to start and stop the service . This instance would also contain a copy of the database (if relevant).
- The third and final instance is the load generator. Also under the direction of the controller, this instance produces a repeatable, controlled, load on the test subject server. If the server in question is a database server, the load generator would fire off and benckmark a series of queries to the database, measure execution time and report back to the controller when done.
The controller iterates over the parameter space like this:
for (sort_buffer_size = 1; sort_buffer_size <= 16; sort_buffer_size++)
for (join_buffer_size = 1; join_buffer_size <= 32; join_buffer_size++)
// Configure test server (sort_buffer_size, join_buffer_size)
// Start load generator and run test
// Record results
The inner loop is executed 512 times and the parameters and timing information is recorded after each iteration. The net result (if viewed graphically) would be a 2-dimensional grid of parameter values and the resulting execution times (other metrics could also be gathered, of course). Visual inspection of this grid (for local minimums and maximums) would provide considerable insight into the server's performance in different configurations.
This model could be expanded to use multiple load generators and/or test servers, and it could also serve to test the fairness of a load balancer.
It might also be possible to avoid exhaustive search of the parameter space by using Monte Carlo methods (trying some points at random and then paying more attention to the most promising areas.
At first glance this might not sound like it has a lot to do with EC2, but something I hear a lot when I go out on speaking tours and get to talk 1 on 1 to developers is that they simply don't have much in the way of infrastructure to test scalability, performance under load, or alternative configurations. Any and all available hardware is currently configured to be part of the production system and there's simply no test hardware to spare in advance of weekly or monthly site updates. Given the sporadic need for such hardware (on average you need almost none, but for a couple of hours a month you need a lot), an on-demand solution like EC2 makes perfect sense. Even if you need to run 3 instances flat-out for 24 hours, that will cost you just $7.20. That's a pretty small price to pay to get a highly tuned server.
I would be very interested in hearing (via comments) your thoughts on this quick note.